Unsupervised learning: basics

1. Unsupervised learning: basics

Hi everyone! Welcome to the video of this course. In this video, we will focus on unsupervised learning, business problems that are solved using such techniques and basic plotting of points that would help us later in the course! Let's get started.

2. Everyday example: Google news

While browsing through Google News, have you wondered what goes behind grouping news items together? How does the algorithm decide which articles are similar? It is the result of an unsupervised learning algorithm. It scans through the text of each article and based on frequently occurring terms, groups articles together. The group of articles shown here is based on the Indian cricket team. Through this course, you will be introduced to various clustering techniques. Similar to this example, you will also perform document clustering on text.

3. Labeled and unlabeled data

Before we define unsupervised learning, let us try to understand the terms: labeled and unlabeled data. Imagine you have a list of points with X and Y coordinates. If only the coordinates of the points are available and there is no other characteristic available to distinguish the data points, it is called unlabeled data. At the same time, if we associate each data point with a group beforehand, say normal and danger zones, we call it labeled data.

4. What is unsupervised learning?

What is unsupervised learning? It is an umbrella term for a group of machine learning algorithms that are used to find patterns. The data that is used in these algorithms is not labeled, classified or characterized prior to running the algorithm. The algorithm is run, therefore, to find and explain inherent structures within the data. Common unsupervised learning algorithms are clustering, anomaly detections, and neural networks. Clustering is used to group similar data points together.

5. What is clustering?

Let us now move on to a specific class of unsupervised learning algorithms. Clustering is the process of grouping items with similar characteristics. The groups so formed are such that items in a single group are closer to each other in terms of some characteristics as compared to items in other clusters. Clustering falls under the group of unsupervised learning algorithms as the data is not labeled, grouped or characterized beforehand. A simple example to demonstrate clustering would be to group points on a 2D plane based on their distance. Let us try to visualize it in Python.

6. Plotting data for clustering - Pokemon sightings

To make a scatter plot, we will use the pyplot class of the matplotlib library in Python. We will plot the sightings of Pokemon in a park in the form of coordinates. The first step, therefore, is to import the required class as plt. Next, we define the coordinates of points to be plotted in two lists - one each for x and y coordinates. In this case, we have fifteen sightings that we would like to plot. Finally, we use the scatter method of the pyplot class, with the lists for the coordinates as arguments and the plot method to display the plot. Let us see how the plot looks.

7. Plotting data for clustering - the scatter plot

Here is how the plot looks like. As a preliminary analysis before you perform any clustering analysis on the points, visualizing helps you understand how many natural clusters are present in the data. There are three clusters in the data, which

8. Plotting data for clustering - clusters

are highlighted in the plot. By visualizing this data, you can infer with some confidence where the Pokemon actually are!

9. Up next - some practice

Although this is a simplified case, real life problems may not have an obvious solution, and you may have to employ more analysis to decipher how many clusters there actually are, which you will learn later in the course. It is time for some exercises based on this video.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.