Local Outlier Factor

1. Local Outlier Factor

In this video, we'll learn about a popular density-based algorithm called Local Outlier Factor.

2. What is Local Outlier Factor (LOF)?

Local Outlier Factor, commonly referred to as LOF, is a well-known algorithm that has existed since 2000. It works well with moderately high-dimensional datasets and is one of the fastest outlier classifiers.

3. How does LOF work?

LOF classifies data points into inliers and outliers using a local outlier factor score, which is where the name is taken from. The LOF score is based on the concept of local density, where locality is defined by choosing k-nearest neighbors, like in KNN. The density itself is calculated between a data point and its distances to its chosen neighbors. Data points with similar densities will form a cluster, while samples with substantially lower densities than their local neighborhood are classified as outliers.

4. The importance of locality

Here, the word local is very important. The LOF score of a sample is not compared to the rest of the dataset, but only to its local neighborhood.

5. LOF visualized

Let's take a closer look at the visualization of LOF. In the plot, we can see a two-dimensional normalized dataset, with two clusters of data points and a dozen outliers. The circle sizes represent how anomalous samples are compared to their local neighborhood. The higher their LOF score, the bigger the circles.

6. LOF visualized

Notice the two points that will be highlighted. Point A is an outlier but the circle size isn't large.

7. LOF visualized

That's because it is much closer to its local neighborhood compared to point B. Point B is far away and therefore, has much more deviation and less density, making its circle bigger. Because of this local approach, LOF can detect outliers that would have been missed in another area of the dataset.

8. Transformed dataset

Now, let's see LOF in action using the transformed version of US Army males dataset.

9. LOF in action

We import the LOF estimator from pyod-dot-models-dot-lof and instantiate it with 20 neighbors and "manhattan" as the distance metric. After the fit, we use LOF's dot-labels_ attribute to print the inlier/outlier labels of the dataset.

10. Filtering in LOF

To filter out the outliers, we use a probability threshold of 55%. We calculate the outlier probabilities into probs and create a boolean mask that checks if probabilities are higher than the threshold. After using the mask, LOF finds only two outliers.

11. LOF details

Like in KNN, the most important parameter of LOF is n_neighbors and it can be tuned in the same way as in KNN. Also remember the best practice of choosing 20 neighbors whenever the number of outliers is below 10% of the total number of instances. If there are more than 10%, n_neighbors should be increased accordingly. LOF does not allow changing the method of averaging local neighborhood distances like in KNN. The method is set to "largest" by default.

12. LOF drawbacks

Even though LOF has fewer parameters than KNN, its anomaly score is much harder to interpret. While KNN used plain-old distance between points as an anomaly score, LOF uses local outlier factor score, which is calculated by performing additional calculations on the distance metric and using concepts such as local reachability distance. While LOF scores around one usually denote inliers, there is no clear rule as to which range of values represent outliers. These values are highly dataset-dependent.

13. Let's practice!

Let's practice what we've learned!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.