BaşlayınÜcretsiz Başlayın

Sparse matrices

During the video lesson you learned about sparse matrices. Sparse matrices can become computational nightmares as the number of text documents and the number of unique words grow. Creating word representations with tweets can easily create sparse matrices because emojis, slang, acronyms, and other forms of language are used.

In this exercise you will walk through the steps to calculate how sparse the Russian tweet dataset is. Note that this is a small example of how quickly text analysis can become a major computational problem.

Bu egzersiz

Introduction to Natural Language Processing in R

kursunun bir parçasıdır
Kursu Görüntüle

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Tokenize and remove stop words
tidy_tweets <- russian_tweets %>%
  ___(word, content) %>%
  ___(stop_words)
# Count by word
unique_words <- tidy_tweets %>%
  count(___)
Kodu Düzenle ve Çalıştır