An example of failing at text analysis
Early on, you discussed the power of removing stop words before conducting text analysis. In this most recent chapter, you reviewed using cosine similarity to identify texts that are similar to each other.
In this exercise, you will explore the very real possibility of failing to use text analysis properly. You will compute cosine similarities for the chapters in the book Animal Farm, without removing stop-words.
Diese Übung ist Teil des Kurses
Introduction to Natural Language Processing in R
Anleitung zur Übung
- Review the provided code to create word counts. This has been completed for you.
- Using the
pairwise_similarity()
function fromwidyr
, calculate the cosine similarities for each chapter in thechapter
column. - Arrange the results with the highest
similarity
values first. - Calculate the average
mean
of thesimilarity
values.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Create word counts
animal_farm_counts <- animal_farm %>%
unnest_tokens(word, text_column) %>%
count(chapter, word)
# Calculate the cosine similarity by chapter, using words
comparisons <- animal_farm_counts %>%
___(___, ___, n) %>%
arrange(desc(___))
# Print the mean of the similarity values
comparisons %>%
summarize(mean = ___(___))