道练习

Shakespearean language preprocessing pipeline

Over at PyBooks, the team wants to transform a vast library of Shakespearean text data for further analysis. The most efficient way to do this is with a text processing pipeline, starting with the preprocessing steps.

The following have been loaded for you: torch, nltk, stopwords, PorterStemmer, get_tokenizer.

The Shakespearean text data is saved as shakespeare and the sentences have already been extracted.

说明 1 / 共 3 个

undefined XP

1

2

3

Create a list of unique English stopwords, saving them to stop_words.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}道练习

说明 1 / 共 3 个

道练习