LoslegenKostenlos loslegen

Anatomy of a Machine Learning Model

Now, you will reinforce your understanding of how data influences the model performance. You will be working with the Airbnb booking dataset (in the file booking.csv). The dataset is suited for classification tasks to predict if someone would cancel a booking. It contains several numerical and categorical columns. You will split the provided dataset into three mutually exclusive samples - train_A.csv, train_B.csv, and test.csv - using split_dataset.py script. Further, for each training dataset, you'll run the data processing and model training pipeline to train a Random Forest Classifier model and test its performance on the test set by using model_training.py. The hyperparameters defined in params.json are consistent in both runs.

The Python scripts are designed to accept command line arguments and run via shell. Feel free to explore these scripts to enrich your understanding.

Diese Übung ist Teil des Kurses

Introduction to Data Versioning with DVC

Kurs anzeigen

Interaktive Übung

In dieser interaktiven Übung kannst du die Theorie in die Praxis umsetzen.

Übung starten