Cross-validation using sklearn

As explained in Chapter 2, overfitting the dataset is a common problem in analytics. This happens when a model has learned the data too closely: it has great performances on the dataset it was trained on, but fails to generalize outside of it.

While the train/test split technique you learned in Chapter 2 ensures that the model does not overfit the training set, hyperparameter tuning may result in overfitting the test component, since it consists in tuning the model to get the best prediction results on the test set. Therefore, it is recommended to validate the model on different testing sets. K-fold cross-validation allows us to achieve this:

it splits the dataset into a training set and a testing set
it fits the model, makes predictions and calculates a score (you can specify if you want the accuracy, precision, recall…)
it repeats the process k times in total
it outputs the average of the 10 scores

In this exercise, you will use Cross Validation on our dataset, and evaluate our results with the cross_val_score function.

Este exercício faz parte do curso

HR Analytics: Predicting Employee Churn in Python

Ver curso

Instruções do exercício

Import the function for implementing cross-validation, cross_val_score(), from the module sklearn.model_selection.
Print the cross-validation score for your model, specifying 10 folds with the cv hyperparameter.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Import the function for implementing cross validation
from sklearn.model_selection import ____

# Use that function to print the cross validation score for 10 folds
print(____(model,features,target,____=10))

Editar e executar o código