Creating a tibble from a corpus
To further explore the corpus on crude oil data that you received from a coworker, you have decided to create a pipeline to clean the text contained in the documents. Instead of exploring how to do this with the tm package, you have decided to transform the corpus into a tibble so you can use the functions unnest_tokens(), count(), and anti_join() that you are already familiar with. The corpus crude contains both the metadata and the text of each document.
Bu egzersiz
Introduction to Natural Language Processing in R
kursunun bir parçasıdırEgzersiz talimatları
- Convert the corpus into a tibble.
- Use
namesto print out the column names. - Tokenize (by word), count, and remove stop words from the
textcolumn ofcrude_tibble.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Create a tibble & Review
crude_tibble <- ___(crude)
___(crude_tibble)
crude_counts <- crude_tibble %>%
# Tokenize by word
___(___, text) %>%
# Count by word
___(word, sort = TRUE) %>%
# Remove stop words
___(stop_words)