Get startedGet started for free

Predicting the rating of an app

Having explored the Google apps dataset in the previous exercise, let's now build a model that predicts the rating of an app given a subset of its features.

To do this, you'll use scikit-learn's DecisionTreeRegressor. As decision trees are the building blocks of many ensemble models, refreshing your memory of how they work will serve you well throughout this course.

We'll use the MAE (mean absolute error) as the evaluation metric. This metric is highly interpretable, as it represents the average absolute difference between actual and predicted ratings.

All required modules have been pre-imported for you. The features and target are available in the variables X and y, respectively.

This exercise is part of the course

Ensemble Methods in Python

View Course

Exercise instructions

  • Use train_test_split() to split X and y into train and test sets. Use 20%, or 0.2, as the test size.
  • Instantiate a DecisionTreeRegressor(), reg_dt, with the following hyperparameters: min_samples_leaf = 3 and min_samples_split = 9.
  • Fit the regressor to the training set using .fit().
  • Predict the labels of the test set using .predict().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Split into train (80%) and test (20%) sets
X_train, X_test, y_train, y_test = ____(____, ____, ____, random_state=42)

# Instantiate the regressor
reg_dt = ____(____, ____, random_state=500)

# Fit to the training set
____

# Evaluate the performance of the model on the test set
y_pred = ____
print('MAE: {:.3f}'.format(mean_absolute_error(y_test, y_pred)))
Edit and Run Code