Writing out your results to a csv for submission
At last, you're ready to submit some predictions for scoring. In this exercise, you'll write your predictions to a .csv
using the .to_csv()
method on a pandas DataFrame. Then you'll evaluate your performance according to the LogLoss metric discussed earlier!
You'll need to make sure your submission obeys the correct format.
To do this, you'll use your predictions
values to create a new DataFrame, prediction_df
.
Interpreting LogLoss & Beating the Benchmark:
When interpreting your log loss score, keep in mind that the score will change based on the number of samples tested. To get a sense of how this very basic model performs, compare your score to the DrivenData benchmark model performance: 2.0455, which merely submitted uniform probabilities for each class.
Remember, the lower the log loss the better. Is your model's log loss lower than 2.0455?
This is a part of the course
“Case Study: School Budgeting with Machine Learning in Python”
Exercise instructions
- Create the
prediction_df
DataFrame by specifying the following arguments to the provided parameterspd.DataFrame()
:pd.get_dummies(df[LABELS]).columns
.holdout.index
.predictions
.
- Save
prediction_df
to a csv file called'predictions.csv'
using the.to_csv()
method. - Submit the predictions for scoring by using the
score_submission()
function withpred_path
set to'predictions.csv'
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Generate predictions: predictions
predictions = clf.predict_proba(holdout[NUMERIC_COLUMNS].fillna(-1000))
# Format predictions in DataFrame: prediction_df
prediction_df = pd.DataFrame(columns=____,
index=____,
data=____)
# Save prediction_df to csv
____
# Submit the predictions for scoring: score
score = ____
# Print score
print('Your model, trained with numeric data only, yields logloss score: {}'.format(score))
This exercise is part of the course
Case Study: School Budgeting with Machine Learning in Python
Learn how to build a model to automatically classify items in a school budget.
In this chapter, you'll build a first-pass model. You'll use numeric data only to train the model. Spoiler alert - throwing out all of the text data is bad for performance! But you'll learn how to format your predictions. Then, you'll be introduced to natural language processing (NLP) in order to start working with the large amounts of text in the data.
Exercise 1: It's time to build a modelExercise 2: Setting up a train-test split in scikit-learnExercise 3: Training a modelExercise 4: Making predictionsExercise 5: Use your model to predict values on holdout dataExercise 6: Writing out your results to a csv for submissionExercise 7: A very brief introduction to NLPExercise 8: Tokenizing textExercise 9: Testing your NLP credentials with n-gramsExercise 10: Representing text numericallyExercise 11: Creating a bag-of-words in scikit-learnExercise 12: Combining text columns for tokenizationExercise 13: What's in a token?What is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.