Get startedGet started for free

Unsupervised learning

1. Unsupervised learning

Great job!

2. Unsupervised learning

Let's switch over to unsupervised learning.

3. Unsupervised learning

Unsupervised learning is quite similar to supervised learning, except it doesn't have a target column - hence the unsupervised part. So what's the point then? Unsupervised learning learns from the dataset, and tries to find patterns. That's the reason this technique is so interesting and powerful: we can find insights without knowing much about our dataset.

4. Applications

Unsupervised learning has different applications. We'll focus on clustering, anomaly detection, association.

5. Clustering

Clustering consists in identifying groups in your dataset. The observations in these groups share stronger similarities with members of their group, than with members of other groups.

6. Clustering example

For example, say we have a dataset with six observations. What clusters would the algorithm detect?

7. Species cluster

Well, it depends. It may come up with two groups: dogs and cats.

8. Color cluster

Or, it might make four groups by color: black, grey, white and brown.

9. Origin cluster

Or, it may find origin groups: the top row originate from Europe, while the bottom row are from Japan. In these examples, I've told you what each group represents. However, you usually don't know what differentiates your clusters in real life. Your model won't tell you why or how it decided on these clusters. It's up to you to investigate and find out.

10. Clustering models

Some clustering models, like K-Means, require you to specify in advance the number of clusters you would like to identify. Others, like DBSCAN, or - get ready - "Density-based spatial clustering of applications with noise", don't require you to specify the number of clusters in advance. Instead, they require you to define what constitutes a cluster, like the minimum number of observations in one cluster.

11. Iris table

Let's say we have flowers of unknown species. All we have is their petal width and length. See the difference with a classification problem? Here, we don't have a column with labels of the species. We don't know which species we're dealing with or even how many there are.

12. K-Means with 4 clusters

If we hypothesize there are 4 species, we can use a K Means and require 4 different clusters. It will result in this clustering.

13. K-Means with 3 clusters

If we hypothesize there are 3 species, we require 3 different clusters.

14. Ground truth

These clusters are actually correct, as there are three different species in the dataset: Setosa, Virginica and Versicolor.

15. Anomaly detection

Let's now talk about anomaly detection.

16. Detecting outliers

Anomaly detection is all about detecting outliers. Outliers are observations that strongly differ from the others.

17. Outliers

On this picture, all of our points are grouped in the bottom left, except for one in the top right. This point is an outlier. It turns out that this point is the sum total of the other observations. The total row wasn't removed before plotting the data.

18. Removing outliers

In this case, the outlier can be removed. With two dimensions, it's easier to find outliers with our naked eye. Try finding outliers in 3 dimensions; that might be doable. How about 4, 10, 20,100? That's why we need unsupervised learning algorithms.

19. Some anomaly detection use cases

In our example, the outlier was an error. But detecting outliers can help find which devices fail faster or last longer, which fraudsters trick the protection systems in place, or which patients surprisingly resist a fatal disease.

20. Association

Let's end with association, which consists in finding relationships between observations.

21. Association

In other words, it's about finding events that happen together. It's often used for market basket analysis, which is just a fancy expression to state "Which objects are bought together?" For example, people who buy jam are likely to buy bread, people who buy beer are likely to buy peanuts, and people who buy wine are likely to buy cheese.

22. Let's practice!

All right, let's check your understanding.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.