Get startedGet started for free

The kernel trick

1. The kernel trick

In this lesson we'll take our first steps towards understanding how the concepts we learned for linear SVMs can be extended to classification problems with complex decision boundaries.

2. The basic idea

The basic idea is to devise a mathematical transformation that renders the problem linearly separable. We'll see how to do this explicitly for a radially separable dataset, i.e., one that has a circular boundary. We'll tackle more general boundaries in the next chapter.

3. The radially separable dataset

As a reminder, here's a plot of the radially separable dataset we created earlier. The data is separated into two classes with the boundary being a circle of radius 0-point-7 units. How would you transform the variables x1 and x2 so as to make the boundary a straight line instead of a circle?

4. Transforming the problem

OK, you might reason as follows: we know that the equation of the boundary is x1 squared plus x2 squared equals 0-point-49. So, let's map x1 squared to a new variable X1 and x2 squared to a new variable X2. In terms of the new variables the equation of the boundary becomes X1 plus X2 equal 0-point-49, which is a straight line. Let's create a plot of the boundary using the transformed variables.

5. Plot in X1-X2 space - code

We'll use ggplot() to plot the transformed dataset. Note that the equation of the line in the transformed space is X2 equals -X1 plus 0-point-49, so the slope of the line is -1 and the intercept is 0-point-49. We can now write the ggplot() code as follows. Let's see what the plot looks like.

6. Plot in X1 - X2 space

As expected the decision boundary is a straight line in the transformed space. Our transformation has made the problem linearly separable. Put another way, if we choose to solve the problem in terms of the original variables, then we should use a quadratic (or second degree polynomial) instead of a straight line.

7. The Polynomial Kernel - Part 1

The polynomial kernel has the general form shown in the slide. The "degree" is the degree of the polynomial. gamma and coef0 are tuning parameters, and u and v are two data points. Now, since we already knew our problem was radially separable, we could guess that we needed a 2nd degree polynomial, but we did not know its exact form.

8. Kernel functions

It turns out, that because of the mathematical formulation of SVMs, one cannot choose just any transformation. The transformation function must have specific properties. Functions that satisfy these properties are called kernel functions. For those of you familiar with vector algebra, kernel functions are generalizations of dot products. If you're not familiar with vector algebra, just remember that the basic idea is to find a kernel function that separates the data well.

9. Radially separable dataset - quadratic kernel

OK, so let's have a quick look at how a quadratic kernel performs on this dataset. First we do the usual 80/20 train/test split and then use the svm() function, setting the degree of the polynomial to 2 and using default values for gamma, coef0, and cost. We get a test accuracy of over 93% compared to the 64% we got with a linear kernel in the previous lesson. This is encouraging, even though it's only for a specific train/test partition. We then visualize the model using the built-in plot() function.

10. Plot of quadratic SVM

This looks much better than the linear SVM: it has a clean decision boundary, even with the default values for the parameters. We'll see how to tune this up in the next lesson.

11. Time to practice!

Now it's your turn.