Overfitting en underfitting

Het interpreteren van modelcomplexiteit is een goede manier om de prestaties van supervised learning te beoordelen. Je doel is een model te maken dat de relatie tussen features en de doelvariabele kan interpreteren én goed kan generaliseren naar nieuwe observaties.

De training- en testsets zijn gemaakt uit de churn_df-gegevensset en vooraf geladen als X_train, X_test, y_train en y_test.

Daarnaast zijn KNeighborsClassifier en numpy als np voor je geïmporteerd.

Deze oefening maakt deel uit van de cursus

Supervised Learning met scikit-learn

Cursus bekijken

Oefeninstructies

Maak neighbors als een numpy-array met waarden van 1 tot en met 12.
Instantier een KNeighborsClassifier, met het aantal buren gelijk aan de neighbor-iterator.
Fit het model op de trainingsdata.
Bereken de accuracyscores voor de trainingsset en de testset afzonderlijk met de methode .score(), en sla de resultaten op in respectievelijk de dictionaries train_accuracies en test_accuracies, waarbij je de neighbor-iterator als index gebruikt.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create neighbors
neighbors = np.arange(____, ____)
train_accuracies = {}
test_accuracies = {}

for neighbor in neighbors:
  
	# Set up a KNN Classifier
	knn = ____(____=____)
  
	# Fit the model
	knn.____(____, ____)
  
	# Compute accuracy
	train_accuracies[____] = knn.____(____, ____)
	test_accuracies[____] = knn.____(____, ____)
print(neighbors, '\n', train_accuracies, '\n', test_accuracies)

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Supervised Learning met scikit-learn

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

In this chapter, you'll be introduced to classification problems and learn how to solve them using supervised learning techniques. You'll learn how to split data into training and test sets, fit a model, make predictions, and evaluate accuracy. You’ll discover the relationship between model complexity and performance, applying what you learn to a churn dataset, where you will classify the churn status of a telecom company's customers.

Exercise 1: Machine Learning met scikit-learn Exercise 2: Binaire classificatie Exercise 3: De supervised learning-workflow Exercise 4: De classificatie-uitdaging Exercise 5: k-Nearest Neighbors: Fitten Exercise 6: k-Nearest Neighbors: voorspellen Exercise 7: Modelprestaties meten Exercise 8: Train/test-split + nauwkeurigheid berekenen Exercise 9: Overfitting en underfitting

Huidige oefening

Exercise 10: Modelcomplexiteit visualiseren

In this chapter, you will be introduced to regression, and build models to predict sales values using a dataset on advertising expenditure. You will learn about the mechanics of linear regression and common performance metrics such as R-squared and root mean squared error. You will perform k-fold cross-validation, and apply regularization to regression models to reduce the risk of overfitting.

Exercise 1: Introduction to regression Exercise 2: Creating features Exercise 3: Building a linear regression model Exercise 4: Visualizing a linear regression model Exercise 5: The basics of linear regression Exercise 6: Fit and predict for regression Exercise 7: Regression performance Exercise 8: Cross-validation Exercise 9: Cross-validation for R-squared Exercise 10: Analyzing cross-validation metrics Exercise 11: Regularized regression Exercise 12: Regularized regression: Ridge Exercise 13: Lasso regression for feature importance

Having trained models, now you will learn how to evaluate them. In this chapter, you will be introduced to several metrics along with a visualization technique for analyzing classification model performance using scikit-learn. You will also learn how to optimize classification and regression models through the use of hyperparameter tuning.

Exercise 1: How good is your model?Exercise 2: Deciding on a primary metric Exercise 3: Assessing a diabetes prediction classifier Exercise 4: Logistic regression and the ROC curve Exercise 5: Building a logistic regression model Exercise 6: The ROC curve Exercise 7: ROC AUC Exercise 8: Hyperparameter tuning Exercise 9: Hyperparameter tuning with GridSearchCV Exercise 10: Hyperparameter tuning with RandomizedSearchCV

Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!

Exercise 1: Preprocessing data Exercise 2: Creating dummy variables Exercise 3: Regression with categorical features Exercise 4: Handling missing data Exercise 5: Dropping missing data Exercise 6: Pipeline for song genre prediction: I Exercise 7: Pipeline for song genre prediction: II Exercise 8: Centering and scaling Exercise 9: Centering and scaling for regression Exercise 10: Centering and scaling for classification Exercise 11: Evaluating multiple models Exercise 12: Visualizing regression model performance Exercise 13: Predicting on the test set Exercise 14: Visualizing classification model performance Exercise 15: Pipeline for predicting song popularity Exercise 16: Congratulations