Shakespearean language preprocessing pipeline
Over at PyBooks, the team wants to transform a vast library of Shakespearean text data for further analysis. The most efficient way to do this is with a text processing pipeline, starting with the preprocessing steps.
The following have been loaded for you:
torch, nltk, stopwords, PorterStemmer, get_tokenizer.
The Shakespearean text data is saved as shakespeare and the sentences have already been extracted.
This exercise is part of the course
Deep Learning for Text with PyTorch
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a list of stopwords
stop_words = set(____(____))