1. Learn
  2. /
  3. Courses
  4. /
  5. Feature Engineering for NLP in Python

Connected

Exercise

Word count of TED talks

ted is a dataframe that contains the transcripts of 500 TED talks. Your job is to compute a new feature word_count which contains the approximate number of words for each talk. Consequently, you also need to compute the average word count of the talks. The transcripts are available as the transcript feature in ted.

In order to complete this task, you will need to define a function count_words that takes in a string as an argument and returns the number of words in the string. You will then need to apply this function to the transcript feature of ted to create the new feature word_count and compute its mean.

Instructions

100 XP
  • Split string into a list of words using the split() method.
  • Return the number of elements in words using len().
  • Apply your function to the transcript column of ted to create the new feature word_count.
  • Compute the average word count of the talks using mean().