Predicting GoT deaths
While the target variable does not have any missing values, other features do. As the focus of the course is not on data cleaning and preprocessing, we have already done the following preprocessing for you:
- Replaced NA values with
0
. - Replace negative values of age with
0
. - Replace NA values of age with the mean.
Let's now build an ensemble model using the averaging technique. The following individual models have been built:
- Logistic Regression (
clf_lr
). - Decision Tree (
clf_dt
). - Support Vector Machine (
clf_svm
).
As the target is binary, all these models might have good individual performance.
Your objective is to combine them using averaging. Recall from the video that this is the same as a soft voting approach, so you should still use the VotingClassifier()
.
This exercise is part of the course
Ensemble Methods in Python
Exercise instructions
- Set up the list of
(string, estimator)
tuples. Use'lr'
forclf_lr
,'dt'
forclf_dt
, and'svm'
forclf_svm
. - Build an averaging classifier called
clf_avg
. Be sure to specify an argument for thevoting
parameter.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build the individual models
clf_lr = LogisticRegression(class_weight='balanced')
clf_dt = DecisionTreeClassifier(min_samples_leaf=3, min_samples_split=9, random_state=500)
clf_svm = SVC(probability=True, class_weight='balanced', random_state=500)
# List of (string, estimator) tuples
estimators = ____
# Build and fit an averaging classifier
clf_avg = ____
clf_avg.fit(X_train, y_train)
# Evaluate model performance
acc_avg = accuracy_score(y_test, clf_avg.predict(X_test))
print('Accuracy: {:.2f}'.format(acc_avg))