Step 2: Identify Text Sources

In this short exercise you will load and examine a small corpus of property rental reviews from around Boston. Hopefully you already know read.csv() which enables you to load a comma separated file. This may seem mundane but the point of this chapter is to get you doing an entire workflow from start to finish so let's begin with data ingestion!

Next you simply apply str() to review the data frame's structure. It is a convenient function for compactly displaying initial values and class types for vectors.

Lastly you will apply dim() to print the dimensions of the data frame. For a data frame, your console will print the number of rows and the number of columns.

Other functions like head(), tail() or summary() are often used for data exploration but in this case we keep the examination short so you can get to the fun sentiment analysis!

The Boston property rental reviews are stored in a CSV file located by the predefined variable bos_reviews_file.

Load the property reviews from bos_reviews_file with read.csv(). Call the object bos_reviews.
Examine the structure of the data frame using the base str() function applied to bos_reviews.
Find out how many reviews you are working with by calling dim() on the bos_reviews.

Fast & Dirty: Polarity scoring

Sentiment Analysis the tidytext Way

Visualizing Sentiment

Case study: Airbnb reviews

Exercise

Step 2: Identify Text Sources

Instructions