Get startedGet started for free

String manipulation

Sometimes a variable is coded in a way that is not natural for R to understand. For example, large integers can sometimes be coded with a comma to separate thousands. In these cases, R interprets the variable as a factor or a character.

In some cases you could use the dec argument in read.table() to get around this, but if the data also includes decimals separated by a dot, this is not an option. To get rid of the unwanted commas, we need string manipulation.

In R, strings are of the basic type character and they can be created by using quotation marks or specific functions. There are quite a few functions in Base R that can be used to manipulate characters, but there is also a bit more consintent and simple tidyverse package stringr.

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

  • Access the stringr package
  • Look at the structure of the Gross National Income (GNI) variable in human
  • Execute the sample code where the comma is removed from each value of GNI.
  • Adjust the code: Use the pipe operator (%>%) to convert the resulting vector to numeric with as.numeric.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# tidyr package and human are available

# access the stringr package
library(stringr)

# look at the structure of the GNI column in 'human'


# remove the commas from GNI and print out a numeric version of it
str_replace(human$GNI, pattern=",", replace ="")
Edit and Run Code