Get startedGet started for free

Predicting GoT deaths

While the target variable does not have any missing values, other features do. As the focus of the course is not on data cleaning and preprocessing, we have already done the following preprocessing for you:

  • Replaced NA values with 0.
  • Replace negative values of age with 0.
  • Replace NA values of age with the mean.

Let's now build an ensemble model using the averaging technique. The following individual models have been built:

  • Logistic Regression (clf_lr).
  • Decision Tree (clf_dt).
  • Support Vector Machine (clf_svm).

As the target is binary, all these models might have good individual performance. Your objective is to combine them using averaging. Recall from the video that this is the same as a soft voting approach, so you should still use the VotingClassifier().

This exercise is part of the course

Ensemble Methods in Python

View Course

Exercise instructions

  • Set up the list of (string, estimator) tuples. Use 'lr' for clf_lr, 'dt' for clf_dt, and 'svm' for clf_svm.
  • Build an averaging classifier called clf_avg. Be sure to specify an argument for the voting parameter.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build the individual models
clf_lr = LogisticRegression(class_weight='balanced')
clf_dt = DecisionTreeClassifier(min_samples_leaf=3, min_samples_split=9, random_state=500)
clf_svm = SVC(probability=True, class_weight='balanced', random_state=500)

# List of (string, estimator) tuples
estimators = ____

# Build and fit an averaging classifier
clf_avg = ____
clf_avg.fit(X_train, y_train)

# Evaluate model performance
acc_avg = accuracy_score(y_test,  clf_avg.predict(X_test))
print('Accuracy: {:.2f}'.format(acc_avg))
Edit and Run Code