Get Started

Writing out your results to a csv for submission

At last, you're ready to submit some predictions for scoring. In this exercise, you'll write your predictions to a .csv using the .to_csv() method on a pandas DataFrame. Then you'll evaluate your performance according to the LogLoss metric discussed earlier!

You'll need to make sure your submission obeys the correct format.

To do this, you'll use your predictions values to create a new DataFrame, prediction_df.

Interpreting LogLoss & Beating the Benchmark:

When interpreting your log loss score, keep in mind that the score will change based on the number of samples tested. To get a sense of how this very basic model performs, compare your score to the DrivenData benchmark model performance: 2.0455, which merely submitted uniform probabilities for each class.

Remember, the lower the log loss the better. Is your model's log loss lower than 2.0455?

This is a part of the course

“Case Study: School Budgeting with Machine Learning in Python”

View Course

Exercise instructions

  • Create the prediction_df DataFrame by specifying the following arguments to the provided parameters pd.DataFrame():
    • pd.get_dummies(df[LABELS]).columns.
    • holdout.index.
    • predictions.
  • Save prediction_df to a csv file called 'predictions.csv' using the .to_csv() method.
  • Submit the predictions for scoring by using the score_submission() function with pred_path set to 'predictions.csv'.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Generate predictions: predictions
predictions = clf.predict_proba(holdout[NUMERIC_COLUMNS].fillna(-1000))

# Format predictions in DataFrame: prediction_df
prediction_df = pd.DataFrame(columns=____,
                             index=____,
                             data=____)


# Save prediction_df to csv
____

# Submit the predictions for scoring: score
score = ____

# Print score
print('Your model, trained with numeric data only, yields logloss score: {}'.format(score))

This exercise is part of the course

Case Study: School Budgeting with Machine Learning in Python

IntermediateSkill Level
3.7+
7 reviews

Learn how to build a model to automatically classify items in a school budget.

In this chapter, you'll build a first-pass model. You'll use numeric data only to train the model. Spoiler alert - throwing out all of the text data is bad for performance! But you'll learn how to format your predictions. Then, you'll be introduced to natural language processing (NLP) in order to start working with the large amounts of text in the data.

Exercise 1: It's time to build a modelExercise 2: Setting up a train-test split in scikit-learnExercise 3: Training a modelExercise 4: Making predictionsExercise 5: Use your model to predict values on holdout dataExercise 6: Writing out your results to a csv for submission
Exercise 7: A very brief introduction to NLPExercise 8: Tokenizing textExercise 9: Testing your NLP credentials with n-gramsExercise 10: Representing text numericallyExercise 11: Creating a bag-of-words in scikit-learnExercise 12: Combining text columns for tokenizationExercise 13: What's in a token?

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free