Predicting the rating of an app
Having explored the Google apps dataset in the previous exercise, let's now build a model that predicts the rating of an app given a subset of its features.
To do this, you'll use scikit-learn
's DecisionTreeRegressor
. As decision trees are the building blocks of many ensemble models, refreshing your memory of how they work will serve you well throughout this course.
We'll use the MAE (mean absolute error) as the evaluation metric. This metric is highly interpretable, as it represents the average absolute difference between actual and predicted ratings.
All required modules have been pre-imported for you. The features and target are available in the variables X
and y
, respectively.
Este ejercicio forma parte del curso
Ensemble Methods in Python
Instrucciones del ejercicio
- Use
train_test_split()
to splitX
andy
into train and test sets. Use 20%, or0.2
, as the test size. - Instantiate a
DecisionTreeRegressor()
,reg_dt
, with the following hyperparameters:min_samples_leaf = 3
andmin_samples_split = 9
. - Fit the regressor to the training set using
.fit()
. - Predict the labels of the test set using
.predict()
.
Ejercicio interactivo práctico
Prueba este ejercicio completando el código de muestra.
# Split into train (80%) and test (20%) sets
X_train, X_test, y_train, y_test = ____(____, ____, ____, random_state=42)
# Instantiate the regressor
reg_dt = ____(____, ____, random_state=500)
# Fit to the training set
____
# Evaluate the performance of the model on the test set
y_pred = ____
print('MAE: {:.3f}'.format(mean_absolute_error(y_test, y_pred)))