Features maken

In dit hoofdstuk werk je met een gegevensset genaamd sales_df, met informatie over advertentie-uitgaven per mediatype en het aantal dollars aan gegenereerde omzet voor de betreffende campagne. De gegevensset is alvast voor je geladen. Dit zijn de eerste twee rijen:

     tv        radio      social_media    sales
1    13000.0   9237.76    2409.57         46677.90
2    41000.0   15886.45   2913.41         150177.83

Je gebruikt de advertentie-uitgaven als features om de omzet te voorspellen, en begint met de kolom "radio". Voordat je voorspellingen kunt doen, moet je echter eerst de feature- en target-arrays maken en ze herschikken naar het juiste formaat voor scikit-learn.

Deze oefening maakt deel uit van de cursus

Supervised Learning met scikit-learn

Cursus bekijken

Oefeninstructies

Maak X, een array met de waarden uit de "radio"-kolom van de DataFrame sales_df.
Maak y, een array met de waarden uit de "sales"-kolom van de DataFrame sales_df.
Herschik X naar een tweedimensionale NumPy-array.
Print de vorm (shape) van X en y.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

import numpy as np

# Create X from the radio column's values
X = ____

# Create y from the sales column's values
y = ____

# Reshape X
X = ____

# Check the shape of the features and targets
print(____)

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Supervised Learning met scikit-learn

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

In this chapter, you'll be introduced to classification problems and learn how to solve them using supervised learning techniques. You'll learn how to split data into training and test sets, fit a model, make predictions, and evaluate accuracy. You’ll discover the relationship between model complexity and performance, applying what you learn to a churn dataset, where you will classify the churn status of a telecom company's customers.

Exercise 1: Machine learning with scikit-learn Exercise 2: Binary classification Exercise 3: The supervised learning workflow Exercise 4: The classification challenge Exercise 5: k-Nearest Neighbors: Fit Exercise 6: k-Nearest Neighbors: Predict Exercise 7: Measuring model performance Exercise 8: Train/test split + computing accuracy Exercise 9: Overfitting and underfitting Exercise 10: Visualizing model complexity

In this chapter, you will be introduced to regression, and build models to predict sales values using a dataset on advertising expenditure. You will learn about the mechanics of linear regression and common performance metrics such as R-squared and root mean squared error. You will perform k-fold cross-validation, and apply regularization to regression models to reduce the risk of overfitting.

Exercise 1: Introductie tot regressie Exercise 2: Features maken

Huidige oefening

Exercise 3: Een lineair regressiemodel bouwen Exercise 4: Een lineair regressiemodel visualiseren Exercise 5: De basis van lineaire regressie Exercise 6: Fitten en voorspellen voor regressie Exercise 7: Regressieprestatie Exercise 8: Cross-validatie Exercise 9: Cross-validation voor R-squared Exercise 10: Cross-validation-metrics analyseren Exercise 11: Geregulariseerde regressie Exercise 12: Geregulariseerde regressie: Ridge Exercise 13: Lasso-regressie voor feature-importance

Having trained models, now you will learn how to evaluate them. In this chapter, you will be introduced to several metrics along with a visualization technique for analyzing classification model performance using scikit-learn. You will also learn how to optimize classification and regression models through the use of hyperparameter tuning.

Exercise 1: How good is your model?Exercise 2: Deciding on a primary metric Exercise 3: Assessing a diabetes prediction classifier Exercise 4: Logistic regression and the ROC curve Exercise 5: Building a logistic regression model Exercise 6: The ROC curve Exercise 7: ROC AUC Exercise 8: Hyperparameter tuning Exercise 9: Hyperparameter tuning with GridSearchCV Exercise 10: Hyperparameter tuning with RandomizedSearchCV

Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!

Exercise 1: Preprocessing data Exercise 2: Creating dummy variables Exercise 3: Regression with categorical features Exercise 4: Handling missing data Exercise 5: Dropping missing data Exercise 6: Pipeline for song genre prediction: I Exercise 7: Pipeline for song genre prediction: II Exercise 8: Centering and scaling Exercise 9: Centering and scaling for regression Exercise 10: Centering and scaling for classification Exercise 11: Evaluating multiple models Exercise 12: Visualizing regression model performance Exercise 13: Predicting on the test set Exercise 14: Visualizing classification model performance Exercise 15: Pipeline for predicting song popularity Exercise 16: Congratulations