Get startedGet started for free

Shakespearean language encoder

With the preprocessed Shakespearean text at your fingertips, you now need to encode it into a numerical representation. You will need to define the encoding steps before putting the pipeline together. To better handle large amounts of data and efficiently perform the encoding, you will use PyTorch's Dataset and DataLoader for batching and shuffling the data.

The following has been loaded for you: torch, nltk, stopwords, PorterStemmer, get_tokenizer, CountVectorizer, Dataset, DataLoader, and preprocess_sentences.

The processed_shakespeare from the Shakespearean text is also available to you.

This exercise is part of the course

Deep Learning for Text with PyTorch

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define your Dataset class
class ____(Dataset):
    def __init__(self, data):
        self.data = ____
    def __len__(self):
        return len(self.data)
    def __getitem__(self, idx):
        return self.____[____]
Edit and Run Code