Word count of TED talks
ted is a dataframe that contains the transcripts of 500 TED talks. Your job is to compute a new feature word_count which contains the approximate number of words for each talk. Consequently, you also need to compute the average word count of the talks. The transcripts are available as the transcript feature in ted.
In order to complete this task, you will need to define a function count_words that takes in a string as an argument and returns the number of words in the string. You will then need to apply this function to the transcript feature of ted to create the new feature word_count and compute its mean.
This exercise is part of the course
Feature Engineering for NLP in Python
Exercise instructions
- Split
stringinto a list of words using thesplit()method. - Return the number of elements in
wordsusinglen(). - Apply your function to the
transcriptcolumn oftedto create the new featureword_count. - Compute the average word count of the talks using
mean().
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Function that returns number of words in a string
def count_words(string):
# Split the string into words
words = ____.____
# Return the number of words
return ____(____)
# Create a new feature word_count
ted['word_count'] = ted[____].apply(____)
# Print the average word count of the talks
print(ted[____].____)