Get startedGet started for free

Introducing stringr

1. Introducing stringr

When you are processing strings, your first requirement is probably a set of tools that is comprehensive and powerful. Something that allows you to complete most string processing tasks you'll come across. But in addition, you'll want functions that are easy to learn and easy to use, so you can get your tasks done quickly and with minimal frustration.

2. stringr

I think stringr is the best package out there that meets these two requirements. stringr is built upon the package stringi. stringi provides a very comprehensive set of string processing functions, but the breadth of all the functions and all the options can be overwhelming. stringr distills just the most common operations, along with the most useful options into a more manageable set to learn. A lot of effort has gone into making stringr concise and consistent. All stringr functions start with the prefix str_. You'll find that a good way to recognize when someone is using a stringr function, but it also helps you when you are looking for a stringr function. You can type str_ hit TAB and see all the stringr functions as possible completions. Since the functions do things to strings, all stringr functions also take a vector of strings as their first argument.

3. str_c()

Your first stringr function is str_c, which is the stringr version of paste. Actually, it is a bit closer to paste0 since, it uses an empty string as the separator by default. Take a look at the first step in your pizza order using str_c instead of paste, the only difference is that we don't have to specify the separator, since the default is what we want. This is an example of a stringr function, performing a similar operation to a base function, but using a default that is more likely to be what you want. After str_c you'll learn about two other stringr functions that do similar things to base R functions but have nicer behavior: str_length and str_sub. The phenemonon of celebrities giving their children stupid names

4. Babynames

is widely considered to have peaked when actress Gwyneth Paltrow and singer Chris Martin named their daughter after their favorite brand of laptop. In this chapter you'll explore how the general population has named their offspring, using the babynames data set. babynames has the number of babies given each name in the USA from 1880 to 2014 as recorded by the Social Security Administration. The n column is a count of how many babies received the name. You'll just look at the year 2014, and often pull out the unique names into a vector. Just be aware when you ask questions without considering n you are making conclusions only about the set of unique names given in 2014, not about their actual frequency among babies born that year.

5. Let's practice!

OK, time for you to learn about str_c and another way it is different to paste; the way it handles missing values.