Session Ready
Exercise

Cleaning a blog post

In this exercise, you have been given an excerpt from a blog post. Your task is to clean this text into a more machine friendly format. This will involve converting to lowercase, lemmatization and removing stopwords, punctuations and non-alphabetic characters.

The excerpt is available as a string blog and has been printed to the console. The list of stopwords are available as stopwords.

Instructions
100 XP
  • Using list comprehension, loop through doc to extract the lemma_ of each token.
  • Remove stopwords and non-alphabetic tokens using stopwords and isalpha().