Exercise

n-gram models for movie tag lines

In this exercise, we have been provided with a corpus of more than 9000 movie tag lines. Our job is to generate n-gram models up to n equal to 1, n equal to 2 and n equal to 3 for this data and discover the number of features for each model.

We will then compare the number of features generated for each model.

Instructions

100 XP
  • Generate an n-gram model with n-grams up to n=1. Name it ng1
  • Generate an n-gram model with n-grams up to n=2. Name it ng2
  • Generate an n-Gram Model with n-grams up to n=3. Name it ng3
  • Print the number of features for each model.