Aan de slagGa gratis aan de slag

Anatomy of a Machine Learning Model

Now, you will reinforce your understanding of how data influences the model performance. You will be working with the Airbnb booking dataset (in the file booking.csv). The dataset is suited for classification tasks to predict if someone would cancel a booking. It contains several numerical and categorical columns. You will split the provided dataset into three mutually exclusive samples - train_A.csv, train_B.csv, and test.csv - using split_dataset.py script. Further, for each training dataset, you'll run the data processing and model training pipeline to train a Random Forest Classifier model and test its performance on the test set by using model_training.py. The hyperparameters defined in params.json are consistent in both runs.

The Python scripts are designed to accept command line arguments and run via shell. Feel free to explore these scripts to enrich your understanding.

Deze oefening maakt deel uit van de cursus

Introduction to Data Versioning with DVC

Cursus bekijken

Praktische interactieve oefening

Zet theorie om in actie met een van onze interactieve oefeningen.

Begin met trainen