Exercise

Load some text

Text mining begins with loading some text data into R, which we'll do with the read.csv() function. By default, read.csv() treats character strings as factor levels like Male/Female. To prevent this from happening, it's very important to use the argument stringsAsFactors = FALSE.

A best practice is to examine the object you read in to make sure you know which column(s) are important. The str() function provides an efficient way of doing this.

If the data frame contains columns that are not text, you may want to make a new object using only the correct column of text (e.g.,some_object$column_name).

Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).

Instructions

100 XP

The data has been loaded for you and is available in coffee_data_file.

  • Create a new object tweets using read.csv() on the file coffee_data_file, which contains tweets mentioning coffee. Remember to add stringsAsFactors = FALSE!
  • Examine the tweets object using str() to determine which column has the text you'll want to analyze.
  • Make a new coffee_tweets object using only the text column you identified earlier. To do so, use the $ operator and column name.