Shakespearean language preprocessing pipeline
Over at PyBooks, the team wants to transform a vast library of Shakespearean text data for further analysis. The most efficient way to do this is with a text processing pipeline, starting with the preprocessing steps.
The following have been loaded for you:
torch
, nltk
, stopwords
, PorterStemmer
, get_tokenizer
.
The Shakespearean text data is saved as shakespeare
and the sentences have already been extracted.
Cet exercice fait partie du cours
Deep Learning for Text with PyTorch
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Create a list of stopwords
stop_words = set(____(____))