1
Introduction
Gratuito
This chapter introduces some key concepts of support vector machines through a simple 1-dimensional example. Students are also walked through the creation of a linearly separable dataset that is used in the subsequent chapter.
2
Support Vector Classifiers - Linear Kernels
Introduces students to the basic concepts of support vector machines by applying the svm algorithm to a dataset that is linearly separable. Key concepts are illustrated through ggplot visualisations that are built from the outputs of the algorithm and the role of the cost parameter is highlighted via a simple example. The chapter closes with a section on how the algorithm deals with multiclass problems.
3
Polynomial Kernels
Provides an introduction to polynomial kernels via a dataset that is radially separable (i.e. has a circular decision boundary). After demonstrating the inadequacy of linear kernels for this dataset, students will see how a simple transformation renders the problem linearly separable thus motivating an intuitive discussion of the kernel trick. Students will then apply the polynomial kernel to the dataset and tune the resulting classifier.
4
Radial Basis Function Kernels
Builds on the previous three chapters by introducing the highly flexible Radial Basis Function (RBF) kernel. Students will create a "complex" dataset that shows up the limitations of polynomial kernels. Then, following an intuitive motivation for the RBF kernel, students see how it addresses the shortcomings of the other kernels discussed in this course.

Initializing

Creating training and test datasets

Splitting a dataset into training and test sets is an important step in building and testing a classification model. The training set is used to build the model and the test set to evaluate its predictive accuracy.

In this exercise, you will split the dataset you created in the previous chapter into training and test sets. The dataset has been loaded in the data frame df and a seed has already been set to ensure reproducibility. Recall that in the previous video, we set the upper bound for the length of the training set with some handy functions - now it's your turn to implement them!

Determine the upper bound for the number of rows to be in the training set and store it in sample_size.
Create the vector train which stores the randomly assigned training set according to the 80/20 proportion.
Assign the rows in train vector to the data frame trainset and the rest to the data frame testset.