Predicting the rating of an app
Having explored the Google apps dataset in the previous exercise, let's now build a model that predicts the rating of an app given a subset of its features.
To do this, you'll use scikit-learn
's DecisionTreeRegressor
. As decision trees are the building blocks of many ensemble models, refreshing your memory of how they work will serve you well throughout this course.
We'll use the MAE (mean absolute error) as the evaluation metric. This metric is highly interpretable, as it represents the average absolute difference between actual and predicted ratings.
All required modules have been pre-imported for you. The features and target are available in the variables X
and y
, respectively.
This exercise is part of the course
Ensemble Methods in Python
Exercise instructions
- Use
train_test_split()
to splitX
andy
into train and test sets. Use 20%, or0.2
, as the test size. - Instantiate a
DecisionTreeRegressor()
,reg_dt
, with the following hyperparameters:min_samples_leaf = 3
andmin_samples_split = 9
. - Fit the regressor to the training set using
.fit()
. - Predict the labels of the test set using
.predict()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split into train (80%) and test (20%) sets
X_train, X_test, y_train, y_test = ____(____, ____, ____, random_state=42)
# Instantiate the regressor
reg_dt = ____(____, ____, random_state=500)
# Fit to the training set
____
# Evaluate the performance of the model on the test set
y_pred = ____
print('MAE: {:.3f}'.format(mean_absolute_error(y_test, y_pred)))