Shakespearean language preprocessing pipeline
Over at PyBooks, the team wants to transform a vast library of Shakespearean text data for further analysis. The most efficient way to do this is with a text processing pipeline, starting with the preprocessing steps.
The following have been loaded for you:
torch
, nltk
, stopwords
, PorterStemmer
, get_tokenizer
.
The Shakespearean text data is saved as shakespeare
and the sentences have already been extracted.
This exercise is part of the course
Deep Learning for Text with PyTorch
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a list of stopwords
stop_words = set(____(____))