1. RBF Kernels: Generating a complex dataset
In this chapter we'll learn about the radial basis function kernel, a general purpose kernel that is found to be very useful in practice. I'll abbreviate this to RBF henceforth.
2. A bit about RBF Kernels
The RBF kernel is highly flexible in that it can fit very complex decision boundaries. Indeed, I can confidently say that when you use SVMs in your own work going forward, it is almost certain that you will use the RBF kernel. We'll start our journey by generating a two dimensional dataset with a complex decision boundary.
3. Generate a complex dataset
The first step is to generate the data points. We'll generate 600 points and to make things interesting, we'll use different distributions for the x1 and x2 attributes. As shown in the code, x1 is normally distributed with mean -0.5 and std deviation 1, while x2 is uniformly distributed between -1 and 1.
4. Generate boundary
The decision boundary consists of two circles that just touch each other at the origin. The first four lines of code set the radii and centers of the two circles. Since the radii are 0-point-7 units and the circles they just touch, their centers are 1-point-4 units apart. We will see this more clearly when we visualize the dataset later. The last long line of code sets the class of the point as -1 or 1 depending on whether the point lies within either of the two circles or outside both. With that done, now let's visualize the dataset.
5. Visualizing the dataset
As usual, we use ggplot() in a way that should now be familiar, distinguishing the two classes using color. Let's see what the plot looks like.
6. Complex dataset
OK, so here it is. Let's add the decision boundary so as to make the separation clearer and to have something to compare to when we solve the classification problem using the RBF kernel.
7. Code to visualize the boundary
The code to generate the boundary is much the same as what we used when we generated the radially separable dataset in Chapter 3. The function generates npoint number of points lying on a circle of radius r, centered at x1Center, x2Center. The plotting code uses this function to generate and plot the boundary, which consists of two circles as described earlier.
8. Visualizing the boundary
OK, so here's what the dataset and boundary look like. Note that the circles appear squished because of the different axis scales. In the next lesson, we will build linear and polynomial SVMs on this dataset. Apart from being a good review of what we've done so far, the poor performance of these will naturally lead us on to a discussion of the RBF kernel.
9. Time to practice!
But before we do that, let's practice what we've learned in this lesson by generating a complex dataset that you will use in the exercises for this chapter.