Instantiate the model
In the following set of exercises, you'll diagnose the bias and variance problems of a regression tree. The regression tree you'll define in this exercise will be used to predict the mpg consumption of cars from the auto dataset using all available features.
We have already processed the data and loaded the features matrix X
and the array y
in your workspace. In addition, the DecisionTreeRegressor
class was imported from sklearn.tree
.
This is a part of the course
“Machine Learning with Tree-Based Models in Python”
Exercise instructions
- Import
train_test_split
fromsklearn.model_selection
. - Split the data into 70% train and 30% test.
- Instantiate a
DecisionTreeRegressor
with max depth 4 andmin_samples_leaf
set to 0.26.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import train_test_split from sklearn.model_selection
____
# Set SEED for reproducibility
SEED = 1
# Split the data into 70% train and 30% test
X_train, X_test, y_train, y_test = ____(____, ____, test_size=____, random_state=SEED)
# Instantiate a DecisionTreeRegressor dt
dt = ____(____=____, ____=____, random_state=SEED)
This exercise is part of the course
Machine Learning with Tree-Based Models in Python
In this course, you'll learn how to use tree-based models and ensembles for regression and classification using scikit-learn.
The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this chapter, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.
Exercise 1: Generalization ErrorExercise 2: Complexity, bias and varianceExercise 3: Overfitting and underfittingExercise 4: Diagnose bias and variance problemsExercise 5: Instantiate the modelExercise 6: Evaluate the 10-fold CV errorExercise 7: Evaluate the training errorExercise 8: High bias or high variance?Exercise 9: Ensemble LearningExercise 10: Define the ensembleExercise 11: Evaluate individual classifiersExercise 12: Better performance with a Voting ClassifierWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.