Aan de slagGa gratis aan de slag

Counting words (I)

Once high level information has been recorded you can begin creating features based on the actual content of each text. One way to do this is to approach it in a similar way to how you worked with categorical variables in the earlier lessons.

  • For each unique word in the dataset a column is created.
  • For each entry, the number of times this word occurs is counted and the count value is entered into the respective column.

These "count" columns can then be used to train machine learning models.

Deze oefening maakt deel uit van de cursus

Feature Engineering for Machine Learning in Python

Cursus bekijken

Oefeninstructies

  • Import CountVectorizer from sklearn.feature_extraction.text.
  • Instantiate CountVectorizer and assign it to cv.
  • Fit the vectorizer to the text_clean column.
  • Print the feature names generated by the vectorizer.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Import CountVectorizer
____

# Instantiate CountVectorizer
cv = ____

# Fit the vectorizer
cv.____(speech_df['text_clean'])

# Print feature names
print(cv.____)
Code bewerken en uitvoeren