Word count of TED talks
ted
is a dataframe that contains the transcripts of 500 TED talks. Your job is to compute a new feature word_count
which contains the approximate number of words for each talk. Consequently, you also need to compute the average word count of the talks. The transcripts are available as the transcript
feature in ted
.
In order to complete this task, you will need to define a function count_words
that takes in a string as an argument and returns the number of words in the string. You will then need to apply this function to the transcript
feature of ted
to create the new feature word_count
and compute its mean.
This exercise is part of the course
Feature Engineering for NLP in Python
Exercise instructions
- Split
string
into a list of words using thesplit()
method. - Return the number of elements in
words
usinglen()
. - Apply your function to the
transcript
column ofted
to create the new featureword_count
. - Compute the average word count of the talks using
mean()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Function that returns number of words in a string
def count_words(string):
# Split the string into words
words = ____.____
# Return the number of words
return ____(____)
# Create a new feature word_count
ted['word_count'] = ted[____].apply(____)
# Print the average word count of the talks
print(ted[____].____)