Session Ready
Exercise

Step 2: Identify Text Sources

In this short exercise you will load and examine a small corpus of property rental reviews from around Boston. Hopefully you already know read.csv() which enables you to load a comma separated file. In this exercise you will also need to specify stringsAsFactors = FALSE when loading the corpus. This ensures that the reviews are character vectors, not factors. This may seem mundane but the point of this chapter is to get you doing an entire workflow from start to finish so let's begin with data ingestion!

Next you simply apply str() to review the data frame's structure. It is a convenient function for compactly displaying initial values and class types for vectors.

Lastly you will apply dim() to print the dimensions of the data frame. For a data frame, your console will print the number of rows and the number of columns.

Other functions like head(), tail() or summary() are often used for data exploration but in this case we keep the examination short so you can get to the fun sentiment analysis!

Instructions
100 XP

The Boston property rental reviews are stored in a CSV file located by the predefined variable bos_reviews_file.

  • Load the property reviews from bos_reviews_file with read.csv(). Call the object bos_reviews. Be sure to pass in the parameter stringsAsFactors = FALSE so the comments are not unique factors.
  • Examine the structure of the data frame using the base str() function applied to bos_reviews.
  • Find out how many reviews you are working with by calling dim() on the bos_reviews.