Text preprocessing practice

Now, it's your turn to apply the techniques you've learned to help clean up text for better NLP results. You'll need to remove stop words and non-alphabetic characters, lemmatize, and perform a new bag-of-words on your cleaned text.

You start with the same tokens you created in the last exercise: lower_tokens. You also have the Counter class imported.

Import the WordNetLemmatizer class from nltk.stem.
Create a list alpha_only that contains only alphabetical characters. You can use the .isalpha() method to check for this.
Create another list called no_stops consisting of words from alpha_only that are not contained in english_stops.
Initialize a WordNetLemmatizer object called wordnet_lemmatizer and use its .lemmatize() method on the tokens in no_stops to create a new list called lemmatized.
Create a new Counter called bow with the lemmatized words.
Lastly, print the 10 most common tokens.

Regular expressions & word tokenization

Simple topic identification

Named-entity recognition

Building a "fake news" classifier

Exercise

Exercise

Text preprocessing practice

Instructions