1. Learn
  2. /
  3. Courses
  4. /
  5. Feature Engineering for NLP in Python

Connected

Exercise

Cleaning a blog post

In this exercise, you have been given an excerpt from a blog post. Your task is to clean this text into a more machine friendly format. This will involve converting to lowercase, lemmatization and removing stopwords, punctuations and non-alphabetic characters.

The excerpt is available as a string blog and has been printed to the console. The list of stopwords are available as stopwords.

Instructions

100 XP
  • Using list comprehension, loop through doc to extract the lemma_ of each token.
  • Remove stopwords and non-alphabetic tokens using stopwords and isalpha().