n-gram models for movie tag lines
In this exercise, we have been provided with a corpus of more than 9000 movie tag lines. Our job is to generate n-gram models up to n equal to 1, n equal to 2 and n equal to 3 for this data and discover the number of features for each model.
We will then compare the number of features generated for each model.
Diese Übung ist Teil des Kurses
<Kurs>Feature Engineering for NLP in Python</Kurs>Übungsanweisungen
- Generate an n-gram model with n-grams up to n=1. Name it
ng1 - Generate an n-gram model with n-grams up to n=2. Name it
ng2 - Generate an n-Gram Model with n-grams up to n=3. Name it
ng3 - Print the number of features for each model.
Interaktive praktische Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Generate n-grams upto n=1
vectorizer_ng1 = CountVectorizer(ngram_range=(1,1))
ng1 = vectorizer_ng1.____(corpus)
# Generate n-grams upto n=2
vectorizer_ng2 = CountVectorizer(ngram_range=(1,2))
ng2 = vectorizer_ng2.____(corpus)
# Generate n-grams upto n=3
vectorizer_ng3 = CountVectorizer(ngram_range=(____, ____))
ng3 = vectorizer_ng3.fit_transform(corpus)
# Print the number of features for each model
print("ng1, ng2 and ng3 have %i, %i and %i features respectively" % (ng1.____[1], ng2.____[1], ng3.____[1]))