1. Visualizing the isolation score
You've learned to use isolation forests to create anomaly scores, now let's visualize them!
In this video, you'll learn to make powerful visualizations called contour plots that show how the isolation forest anomaly score varies across an entire region. This is possible because the predict function enables scores to be calculated at any location, even where no data were observed.
2. Sequences of values
The contour plot works by showing the anomaly scores on a grid that spans the region covered by the Heights and Widths in the furniture data.
To make the grid, we first generate a regular sequence of values on the Width and Height variables using the seq function.
The seq function has arguments from, to, and length dot out which define the lower and upper bound and how many values there should be in the sequence. The code shown generates a pair of sequences each of length 20 between the minimum and maximum values of the Height and Width variables.
The resulting objects h underscore seq and w underscore seq are numeric vectors each with length 20.
3. Building a grid
The next step is to build a grid of values from the pair of sequences. The function expand dot grid returns a data frame containing all possible combinations of the values in a set of input sequences.
By passing the pair of sequences w underscore seq and h underscore seq as inputs, the data frame furniture underscore grid is returned, which contains the required grid of Heights and Widths.
It's important that the column names of furniture underscore grid match those of the original features so that we can use the predict function in the next step. The syntax Width equals w underscore seq and Height equals h underscore seq used inside expand dot grid ensures that furniture underscore grid inherits the correct column names.
4. Scoring the grid
The object furniture underscore grid is a data frame containing 400 values over the two-dimensional space of Height and Width.
Let's now suppose we've already fitted an isolation forest model to the furniture data, called furniture underscore forest. Obtaining isolation scores for the grid of data uses the predict function and is just as easy as it was for the original data. As before, the first argument of predict is the model object furniture underscore forest, and the second argument is the data for which we'd like predictions, in this case the grid called furniture underscore grid.
The resulting anomaly score has been appended to the furniture underscore grid data to make it easier to visualize.
5. Make the contour plot!
A contour plot is produced using the contourplot function from the lattice package. The first argument is a formula. On the left side is the score which will be displayed as colors and contours, on the right side are the feature names Height and Width separated by a plus sign. The data argument must refer to the data frame containing the columns named in the formula. In this case, it's the data frame furniture underscore grid. The final argument, region, is a logical argument that determines whether the space between contours should be filled with color. Setting this to TRUE makes the plot much easier to interpret.
The resulting contour plot is shown and is a great way to see how the score varies. You can see the two clusters of furniture points shown as low anomaly score pink blobs, and that the highest anomaly scores exceed 0 point 7 and are visible in the darker blue corners of the region.
6. Let's practice!
Let's practice making some contour plots!