As the kNN algorithm literally "learns by example" it is a case in point for starting to understand supervised machine learning. This chapter will introduce classification while working through the application of kNN to self-driving vehicle road sign recognition.

Classification with Nearest Neighbors

Recognizing a road sign with kNN

Thinking like kNN

Exploring the traffic sign dataset

Classifying a collection of road signs

What about the 'k' in kNN?

Understanding the impact of 'k'

Testing other 'k' values

Seeing how the neighbors voted

Data preparation for kNN

Why normalize data?

k-Nearest Neighbors (kNN)

Naive Bayes uses principles from the field of statistics to make predictions. This chapter will introduce the basics of Bayesian methods while exploring how to apply these techniques to iPhone-like destination suggestions.

Understanding Bayesian methods

Computing probabilities

Understanding dependent events

A simple Naive Bayes location model

Examining "raw" probabilities

Understanding independence

Understanding NB's "naivety"

Who are you calling naive?

A more sophisticated location model

Preparing for unforeseen circumstances

Understanding the Laplace correction

Applying Naive Bayes to other problems

Handling numeric predictors

Naive Bayes

Logistic regression involves fitting a curve to numeric data to make predictions about binary events. Arguably one of the most widely used machine learning methods, this chapter will provide an overview of the technique while illustrating how to apply it to fundraising data.

Making binary predictions with regression

Building simple logistic regression models

Making a binary prediction

The limitations of accuracy

Model performance tradeoffs

Calculating ROC Curves and AUC

Comparing ROC curves

Dummy variables, missing data, and interactions

Coding categorical features

Handling missing data

Understanding missing value indicators

Building a more sophisticated model

Automatic feature selection

The dangers of stepwise regression

Building a stepwise regression model

Logistic Regression

Classification trees use flowchart-like structures to make decisions. Because humans can readily understand these tree structures, classification trees are useful when transparency is needed, such as in loan approval. We'll use the Lending Club dataset to simulate this scenario.

Making decisions with trees

Building a simple decision tree

Visualizing classification trees

Understanding the tree's decisions

Growing larger classification trees

Why do some branches split?

Creating random test datasets

Building and evaluating a larger tree

Conducting a fair performance evaluation

Tending to classification trees

Preventing overgrown trees

Creating a nicely pruned tree

Why do trees benefit from pruning?

Seeing the forest from the trees

Understanding random forests

Building a random forest model

Classification Trees

Lending Club loan data

Traffic sign image data

Donation data

Brett's location data

This beginner-level introduction to machine learning covers four of the most common classification algorithms. You will come away with a basic understanding of how each algorithm approaches a learning task, as well as learn the R functions needed to apply these tools to your own work.

Intermediate R

In this course you will learn the basics of machine learning for classification.

Creating random test datasets

“Supervised Learning in R: Classification”

Exercise instructions

Hands-on interactive exercise

Supervised Learning in R: Classification

Chapter 1: k-Nearest Neighbors (kNN)

Chapter 2: Naive Bayes

Chapter 3: Logistic Regression

Chapter 4: Classification Trees

What is DataCamp?