CommencerCommencer gratuitement

Predicting the rating of an app

Having explored the Google apps dataset in the previous exercise, let's now build a model that predicts the rating of an app given a subset of its features.

To do this, you'll use scikit-learn's DecisionTreeRegressor. As decision trees are the building blocks of many ensemble models, refreshing your memory of how they work will serve you well throughout this course.

We'll use the MAE (mean absolute error) as the evaluation metric. This metric is highly interpretable, as it represents the average absolute difference between actual and predicted ratings.

All required modules have been pre-imported for you. The features and target are available in the variables X and y, respectively.

Cet exercice fait partie du cours

Ensemble Methods in Python

Afficher le cours

Instructions

  • Use train_test_split() to split X and y into train and test sets. Use 20%, or 0.2, as the test size.
  • Instantiate a DecisionTreeRegressor(), reg_dt, with the following hyperparameters: min_samples_leaf = 3 and min_samples_split = 9.
  • Fit the regressor to the training set using .fit().
  • Predict the labels of the test set using .predict().

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Split into train (80%) and test (20%) sets
X_train, X_test, y_train, y_test = ____(____, ____, ____, random_state=42)

# Instantiate the regressor
reg_dt = ____(____, ____, random_state=500)

# Fit to the training set
____

# Evaluate the performance of the model on the test set
y_pred = ____
print('MAE: {:.3f}'.format(mean_absolute_error(y_test, y_pred)))
Modifier et exécuter le code