Pickles
Finally, it is time for you to push your first model to production. It is a random forest classifier which you will use as a baseline, while you are still working to develop a better alternative. You have access to the data split in training test with their usual names, X_train
, X_test
, y_train
and y_test
, as well as to the modules RandomForestClassifier()
and pickle
, whose methods .load()
and .dump()
you will need for this exercise.
This exercise is part of the course
Designing Machine Learning Workflows in Python
Exercise instructions
- Fit a random forest classifier to the data. Fix the random seed to 42 ensure that your results are reproducible.
- Write the model to file using pickle. Open the destination file using the
with open(____) as ____
syntax. - Now load the model from file into a different variable name,
clf_from_file
. - Store the predictions from the model you loaded into a variable
preds
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit a random forest to the training set
clf = ____(____=42).____(
X_train, y_train)
# Save it to a file, to be pushed to production
with ____('model.pkl', ____) as ____:
pickle.____(clf, file=file)
# Now load the model from file in the production environment
with ____ as file:
clf_from_file = pickle.____(file)
# Predict the labels of the test dataset
preds = clf_from_file.____