Exercise

Cleaning TED talks in a dataframe

In this exercise, we will revisit the TED Talks from the first chapter. You have been a given a dataframe ted consisting of 5 TED Talks. Your task is to clean these talks using techniques discussed earlier by writing a function preprocess and applying it to the transcript feature of the dataframe.

The stopwords list is available as stopwords.

Instructions

100 XP
  • Generate the Doc object for text. Ignore the disable argument for now.
  • Generate lemmas using list comprehension using the lemma_ attribute.
  • Remove non-alphabetic characters using isalpha() in the if condition.