Get startedGet started for free

Pickles

Finally, it is time for you to push your first model to production. It is a random forest classifier which you will use as a baseline, while you are still working to develop a better alternative. You have access to the data split in training test with their usual names, X_train, X_test, y_train and y_test, as well as to the modules RandomForestClassifier() and pickle, whose methods .load() and .dump() you will need for this exercise.

This exercise is part of the course

Designing Machine Learning Workflows in Python

View Course

Exercise instructions

  • Fit a random forest classifier to the data. Fix the random seed to 42 ensure that your results are reproducible.
  • Write the model to file using pickle. Open the destination file using the with open(____) as ____ syntax.
  • Now load the model from file into a different variable name, clf_from_file.
  • Store the predictions from the model you loaded into a variable preds.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit a random forest to the training set
clf = ____(____=42).____(
  X_train, y_train)

# Save it to a file, to be pushed to production
with ____('model.pkl', ____) as ____:
    pickle.____(clf, file=file)

# Now load the model from file in the production environment
with ____ as file:
    clf_from_file = pickle.____(file)

# Predict the labels of the test dataset
preds = clf_from_file.____
Edit and Run Code