Word count of TED talks

ted is a dataframe that contains the transcripts of 500 TED talks. Your job is to compute a new feature word_count which contains the approximate number of words for each talk. Consequently, you also need to compute the average word count of the talks. The transcripts are available as the transcript feature in ted.

In order to complete this task, you will need to define a function count_words that takes in a string as an argument and returns the number of words in the string. You will then need to apply this function to the transcript feature of ted to create the new feature word_count and compute its mean.

This exercise is part of the course

Feature Engineering for NLP in Python

View Course

Exercise instructions

  • Split string into a list of words using the split() method.
  • Return the number of elements in words using len().
  • Apply your function to the transcript column of ted to create the new feature word_count.
  • Compute the average word count of the talks using mean().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Function that returns number of words in a string
def count_words(string):
	# Split the string into words
    words = ____.____
    
    # Return the number of words
    return ____(____)

# Create a new feature word_count
ted['word_count'] = ted[____].apply(____)

# Print the average word count of the talks
print(ted[____].____)