Session Ready
Exercise

Text preprocessing practice

Now, it's your turn to apply the techniques you've learned to help clean up text for better NLP results. You'll need to remove stop words and non-alphabetic characters, lemmatize, and perform a new bag-of-words on your cleaned text.

You start with the same tokens you created in the last exercise: lower_tokens. You also have the Counter class imported.

Instructions
100 XP
  • Import the WordNetLemmatizer class from nltk.stem.
  • Create a list alpha_only that contains only alphabetical characters. You can use the .isalpha() method to check for this.
  • Create another list called no_stops consisting of words from alpha_only that are not contained in english_stops.
  • Initialize a WordNetLemmatizer object called wordnet_lemmatizer and use its .lemmatize() method on the tokens in no_stops to create a new list called lemmatized.
  • Create a new Counter called bow with the lemmatized words.
  • Lastly, print the 10 most common tokens.