In the first chapter of this course, you'll fit regression models with <code>train()</code> and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).

Welcome to the course

In-sample RMSE for linear regression

In-sample RMSE for linear regression on diamonds

Out-of-sample error measures

Out-of-sample RMSE for linear regression

Randomly order the data frame

Try an 80/20 split

Predict on test set

Calculate test set RMSE by hand

Comparing out-of-sample RMSE to in-sample RMSE

Cross-validation

Advantage of cross-validation

10-fold cross-validation

5-fold cross-validation

5 x 5-fold cross-validation

Making predictions on new data

Regression Models: Fitting and Evaluating Their Performance

In this chapter, you'll fit classification models with <code>train()</code> and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).

Logistic regression on sonar

Why a train/test split?

Try a 60/40 split

Fit a logistic regression model

Confusion matrix

Confusion matrix takeaways

Calculate a confusion matrix

Calculating accuracy

Calculating true positive rate

Calculating true negative rate

Class probabilities and predictions

Probabilities and classes

Try another threshold

From probabilites to confusion matrix

Introducing the ROC curve

What's the value of a ROC curve?

Plot an ROC curve

Area under the curve (AUC)

Model, ROC, and AUC

Customizing trainControl

Using custom trainControl

Classification Models: Fitting and Evaluating Their Performance

In this chapter, you will use the <code>train()</code> function to tweak model parameters through cross-validation and grid search.

Random forests and wine

Random forests vs. linear models

Fit a random forest

Explore a wider model space

Advantage of a longer tune length

Try a longer tune length

Custom tuning grids

Advantages of a custom tuning grid

Fit a random forest with custom tuning

Introducing glmnet

Advantage of glmnet

Make a custom trainControl

Fit glmnet with custom trainControl

glmnet with custom tuning grid

Why a custom tuning grid?

glmnet with custom trainControl and tuning

Interpreting glmnet plots

Tuning Model Parameters to Improve Performance

In this chapter, you will practice using <code>train()</code> to preprocess data before fitting models, improving your ability to making accurate predictions.

Median imputation

Median imputation vs. omitting rows

Apply median imputation

KNN imputation

Comparing KNN imputation to median imputation

Use KNN imputation

Compare KNN and median imputation

Multiple preprocessing methods

Order of operations

Combining preprocessing methods

Handling low-information predictors

Why remove near zero variance predictors?

Remove near zero variance predictors

preProcess() and nearZeroVar()

Fit model on reduced blood-brain data

Principle components analysis (PCA)

Using PCA as an alternative to nearZeroVar()

Preprocessing Data

In the final chapter of this course, you'll learn how to use <code>resamples()</code> to compare multiple models and select (or ensemble) the best one(s).

Reusing a trainControl

Why reuse a trainControl?

Make custom train/test indices

Reintroducing glmnet

glmnet as a baseline model

Fit the baseline model

Reintroducing random forest

Random forest drawback

Random forest with custom trainControl

Comparing models

Matching train/test indices

Create a resamples object

More on resamples

Create a box-and-whisker plot

Create a scatterplot

Ensembling models

Summary

Selecting Models: A Case Study in Churn Prediction

Diamonds

Sonar

Wine

Overfit data

Breast Cancer

Blood-brain

Churn

Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast growing fields of research in the world of data science. This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for optimal performance, how to preprocess data for better results, and much more. The popular caret R package, which provides a consistent interface to all of R's most powerful machine learning facilities, is used throughout the course.

Introduction to Regression in R

This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for performance, and how to preprocess data

Machine Learning with caret in R

This course teaches the big ideas in machine learning like how to build and evaluate predictive models.

Machine Learning Fundamentals in R

Machine Learning Scientist in R

Likely to Recommend

Try an 80/20 split

“Machine Learning with caret in R”

Exercise instructions

Hands-on interactive exercise

Machine Learning with caret in R

Chapter 1: Regression Models: Fitting and Evaluating Their Performance

Chapter 2: Classification Models: Fitting and Evaluating Their Performance

Chapter 3: Tuning Model Parameters to Improve Performance

Chapter 4: Preprocessing Data

Chapter 5: Selecting Models: A Case Study in Churn Prediction

What is DataCamp?