BaşlayınÜcretsiz Başlayın

Creating a tibble from a corpus

To further explore the corpus on crude oil data that you received from a coworker, you have decided to create a pipeline to clean the text contained in the documents. Instead of exploring how to do this with the tm package, you have decided to transform the corpus into a tibble so you can use the functions unnest_tokens(), count(), and anti_join() that you are already familiar with. The corpus crude contains both the metadata and the text of each document.

Bu egzersiz

Introduction to Natural Language Processing in R

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Convert the corpus into a tibble.
  • Use names to print out the column names.
  • Tokenize (by word), count, and remove stop words from the text column of crude_tibble.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Create a tibble & Review
crude_tibble <- ___(crude)
___(crude_tibble)

crude_counts <- crude_tibble %>%
  # Tokenize by word 
  ___(___, text) %>%
  # Count by word
  ___(word, sort = TRUE) %>%
  # Remove stop words
  ___(stop_words)
Kodu Düzenle ve Çalıştır