1. Learn
  2. /
  3. Courses
  4. /
  5. String Manipulation with stringr in R

Connected

Exercise

Parsing age and gender into pieces

To finish up, you need to pull out the individual pieces and tidy them into usable variables.

There are a few ways you could get at one piece: you could extract out the piece you need, you could replace everything that isn't the piece you need with "", or you could try to split into the pieces you need. You'll try a few of these in this exercise and you'll see yet another way in the next chapter. For the first option, stringr has a nice convenience function, str_remove(), that works like str_replace() with replacement = "".

One benefit of building up your pattern in pieces is you already have patterns for each part that you can reuse now.

Instructions 1/3

undefined XP
  • 1

    Use str_extract with your age pattern to extract just the age from age_gender, then transform it to a number with as.numeric().

  • 2
    • Create genders by using str_remove() with your age %R% unit pattern to replace everything except the gender with "".
    • genders has a few extra spaces; remove them.
  • 3
    • Get time_units by using str_extract() on age_gender with your unit pattern.
    • To know if the units are months or years we just need the first character after any spaces. Use str_extract() on time_units with the pattern WRD to get time_units_clean.
    • Complete the final line to convert any ages reported in months to an age in years.