Hyperparameter-tuning met RandomizedSearchCV

Zoals je zag, kan GridSearchCV veel rekenkracht kosten, zeker als je een grote hyperparameterspace doorzoekt. In dat geval kun je RandomizedSearchCV gebruiken, dat een vast aantal hyperparameterinstellingen test op basis van opgegeven kansverdelingen.

Trainings- en testsets uit diabetes_df zijn al voor je ingeladen als X_train, X_test, y_train en y_test, waarbij de target "diabetes" is. Er is een logistiek regressiemodel aangemaakt en opgeslagen als logreg, en een KFold-variabele als kf.

Je definieert een reeks hyperparameters en gebruikt RandomizedSearchCV (geïmporteerd uit sklearn.model_selection) om binnen deze opties te zoeken naar optimale hyperparameters.

Deze oefening maakt deel uit van de cursus

Supervised Learning met scikit-learn

Cursus bekijken

Oefeninstructies

Maak params aan, voeg "l1" en "l2" toe als waarden voor penalty, stel C in op een bereik van 50 floatwaarden tussen 0.1 en 1.0, en class_weight op ofwel "balanced" of een dictionary met 0:0.8, 1:0.2.
Maak het Randomized Search CV-object aan, geef het model en de parameters door en stel cv gelijk aan kf.
Fit logreg_cv op de trainingsdata.
Print de beste parameters van het model en de nauwkeurigheidsscore.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create the parameter space
params = {"penalty": ["____", "____"],
         "tol": np.linspace(0.0001, 1.0, 50),
         "C": np.linspace(____, ____, ____),
         "class_weight": ["____", {0:____, 1:____}]}

# Instantiate the RandomizedSearchCV object
logreg_cv = ____(____, ____, cv=____)

# Fit the data to the model
logreg_cv.____(____, ____)

# Print the tuned parameters and score
print("Tuned Logistic Regression Parameters: {}".format(____.____))
print("Tuned Logistic Regression Best Accuracy Score: {}".format(____.____))

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Supervised Learning met scikit-learn

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

In this chapter, you'll be introduced to classification problems and learn how to solve them using supervised learning techniques. You'll learn how to split data into training and test sets, fit a model, make predictions, and evaluate accuracy. You’ll discover the relationship between model complexity and performance, applying what you learn to a churn dataset, where you will classify the churn status of a telecom company's customers.

Exercise 1: Machine learning with scikit-learn Exercise 2: Binary classification Exercise 3: The supervised learning workflow Exercise 4: The classification challenge Exercise 5: k-Nearest Neighbors: Fit Exercise 6: k-Nearest Neighbors: Predict Exercise 7: Measuring model performance Exercise 8: Train/test split + computing accuracy Exercise 9: Overfitting and underfitting Exercise 10: Visualizing model complexity

In this chapter, you will be introduced to regression, and build models to predict sales values using a dataset on advertising expenditure. You will learn about the mechanics of linear regression and common performance metrics such as R-squared and root mean squared error. You will perform k-fold cross-validation, and apply regularization to regression models to reduce the risk of overfitting.

Exercise 1: Introduction to regression Exercise 2: Creating features Exercise 3: Building a linear regression model Exercise 4: Visualizing a linear regression model Exercise 5: The basics of linear regression Exercise 6: Fit and predict for regression Exercise 7: Regression performance Exercise 8: Cross-validation Exercise 9: Cross-validation for R-squared Exercise 10: Analyzing cross-validation metrics Exercise 11: Regularized regression Exercise 12: Regularized regression: Ridge Exercise 13: Lasso regression for feature importance

Having trained models, now you will learn how to evaluate them. In this chapter, you will be introduced to several metrics along with a visualization technique for analyzing classification model performance using scikit-learn. You will also learn how to optimize classification and regression models through the use of hyperparameter tuning.

Exercise 1: Hoe goed is je model?Exercise 2: Kiezen van een primaire metriek Exercise 3: Een classifier voor diabetesvoorspelling beoordelen Exercise 4: Logistische regressie en de ROC-curve Exercise 5: Een logistiek regressiemodel bouwen Exercise 6: De ROC-curve Exercise 7: ROC AUC Exercise 8: Hyperparametertuning Exercise 9: Hyperparameter-tuning met GridSearchCV Exercise 10: Hyperparameter-tuning met RandomizedSearchCV

Huidige oefening

Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!

Exercise 1: Preprocessing data Exercise 2: Creating dummy variables Exercise 3: Regression with categorical features Exercise 4: Handling missing data Exercise 5: Dropping missing data Exercise 6: Pipeline for song genre prediction: I Exercise 7: Pipeline for song genre prediction: II Exercise 8: Centering and scaling Exercise 9: Centering and scaling for regression Exercise 10: Centering and scaling for classification Exercise 11: Evaluating multiple models Exercise 12: Visualizing regression model performance Exercise 13: Predicting on the test set Exercise 14: Visualizing classification model performance Exercise 15: Pipeline for predicting song popularity Exercise 16: Congratulations