1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Natural Language Processing in R

Exercise

An example of failing at text analysis

Early on, you discussed the power of removing stop words before conducting text analysis. In this most recent chapter, you reviewed using cosine similarity to identify texts that are similar to each other.

In this exercise, you will explore the very real possibility of failing to use text analysis properly. You will compute cosine similarities for the chapters in the book Animal Farm, without removing stop-words.

Instructions

100 XP
  • Review the provided code to create word counts. This has been completed for you.
  • Using the pairwise_similarity() function from widyr, calculate the cosine similarities for each chapter in the chapter column.
  • Arrange the results with the highest similarity values first.
  • Calculate the average mean of the similarity values.