Build the winning model

You have arrived! This is where all of your hard work pays off. It's time to build the model that won DrivenData's competition.

You've constructed a robust, powerful pipeline capable of processing training and testing data. Now that you understand the data and know all of the tools you need, you can essentially solve the whole problem in a relatively small number of lines of code. Wow!

All you need to do is add the HashingVectorizer step to the pipeline to replace the CountVectorizer step.

The parameters non_negative=True, norm=None, and binary=False make the HashingVectorizer perform similarly to the default settings on the CountVectorizer so you can just replace one with the other.

Import HashingVectorizer from sklearn.feature_extraction.text.
Add a HashingVectorizer step to the pipeline.
- Name the step 'vectorizer'.
- Use the TOKENS_ALPHANUMERIC token pattern.
- Specify the ngram_range to be (1, 2)

Exploring the raw data

Creating a simple first model

Improving your model

Learning from the experts

Exercise

Build the winning model

Instructions