Shakespearean language encoder
With the preprocessed Shakespearean text at your fingertips, you now need to encode it into a numerical representation. You will need to define the encoding steps before putting the pipeline together. To better handle large amounts of data and efficiently perform the encoding, you will use PyTorch's Dataset and DataLoader for batching and shuffling the data.
The following has been loaded for you:
torch
, nltk
, stopwords
, PorterStemmer
, get_tokenizer
, CountVectorizer
, Dataset
, DataLoader
, and preprocess_sentences
.
The processed_shakespeare
from the Shakespearean text is also available to you.
This exercise is part of the course
Deep Learning for Text with PyTorch
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define your Dataset class
class ____(Dataset):
def __init__(self, data):
self.data = ____
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.____[____]