1
Data Pre-processing and Visualization
Free
In the first chapter of this course, you'll perform all the preprocessing steps required to create a predictive machine learning model, including what to do with missing values, outliers, and how to normalize your dataset.
2
Supervised Learning
In the second chapter of this course, you'll practice different several aspects of supervised machine learning techniques, such as selecting the optimal feature subset, regularization to avoid model overfitting, feature engineering, and ensemble models to address the so-called bias-variance trade-off.
3
Unsupervised Learning
In the third chapter of this course, you'll use unsupervised learning to apply feature extraction and visualization techniques for dimensionality reduction and clustering methods to select not only an appropriate clustering algorithm but optimal cluster number for a dataset.
4
Model Selection and Evaluation
In the fourth and final chapter of this course, you'll really step it up and apply bootstrapping and cross-validation to evaluate performance for model generalization, resampling techniques to imbalanced classes, detect and remove multicollinearity, and build an ensemble model.

Initializing

K-means clustering

In a machine learning interview setting, you might be asked how the output from K-means clustering might be used to assess its performance as the best algorithm.

In this exercise you'll practice K-means clustering. Using the .inertia_ attribute to compare models with different numbers of clusters, k, you'll then also use this information to assess cluster number in the next exercise.

Recall that the target variable in the diabetes dataset is progression.

Where you are in the pipeline:

Machine learning pipeline

1
- Create a feature matrix X by dropping the target variable progression and fit the data to the instantiated k-means object.

2
- Instantiate a 5 cluster k-means and print its inertia.
3
- Fit the feature matrix to a 10-cluster k-means and print its inertia.
4
- Fit the feature matrix to a 20-cluster k-means and print its inertia.