Get Started

Instantiate the model

In the following set of exercises, you'll diagnose the bias and variance problems of a regression tree. The regression tree you'll define in this exercise will be used to predict the mpg consumption of cars from the auto dataset using all available features.

We have already processed the data and loaded the features matrix X and the array y in your workspace. In addition, the DecisionTreeRegressor class was imported from sklearn.tree.

This is a part of the course

“Machine Learning with Tree-Based Models in Python”

View Course

Exercise instructions

  • Import train_test_split from sklearn.model_selection.
  • Split the data into 70% train and 30% test.
  • Instantiate a DecisionTreeRegressor with max depth 4 and min_samples_leaf set to 0.26.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import train_test_split from sklearn.model_selection
____

# Set SEED for reproducibility
SEED = 1

# Split the data into 70% train and 30% test
X_train, X_test, y_train, y_test = ____(____, ____, test_size=____, random_state=SEED)

# Instantiate a DecisionTreeRegressor dt
dt = ____(____=____, ____=____, random_state=SEED)

This exercise is part of the course

Machine Learning with Tree-Based Models in Python

IntermediateSkill Level
4.5+
48 reviews

In this course, you'll learn how to use tree-based models and ensembles for regression and classification using scikit-learn.

The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this chapter, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.

Exercise 1: Generalization ErrorExercise 2: Complexity, bias and varianceExercise 3: Overfitting and underfittingExercise 4: Diagnose bias and variance problemsExercise 5: Instantiate the model
Exercise 6: Evaluate the 10-fold CV errorExercise 7: Evaluate the training errorExercise 8: High bias or high variance?Exercise 9: Ensemble LearningExercise 10: Define the ensembleExercise 11: Evaluate individual classifiersExercise 12: Better performance with a Voting Classifier

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free