Anatomy of a Machine Learning Model
Now, you will reinforce your understanding of how data influences the model performance. You will be working with the Airbnb booking dataset (in the file booking.csv
). The dataset is suited for classification tasks to predict if someone would cancel a booking. It contains several numerical and categorical columns.
You will split the provided dataset into three mutually exclusive samples - train_A.csv
, train_B.csv
, and test.csv
- using split_dataset.py
script. Further, for each training dataset, you'll run the data processing and model training pipeline to train a Random Forest Classifier model and test its performance on the test set by using model_training.py
. The hyperparameters defined in params.json
are consistent in both runs.
The Python scripts are designed to accept command line arguments and run via shell. Feel free to explore these scripts to enrich your understanding.
Diese Übung ist Teil des Kurses
Introduction to Data Versioning with DVC
Interaktive Übung
In dieser interaktiven Übung kannst du die Theorie in die Praxis umsetzen.
