Get startedGet started for free

Dominant colors in images

1. Dominant colors in images

In the final chapter of this course, let us try to use clustering on real world problems. In this first video, we will analyze images to determine dominant colors.

2. Dominant colors in images

Any image consists of pixels, each pixel represents a dot in the image. A pixel consists of three values - each value is a number between 0-255, representing the amount of its red, green and blue components. The combination of these forms the actual color of the pixel. To find the dominant colors, we will perform k-means clustering, with its RGB components. One important use of k-means clustering on images is to segment satellite images to identify surface features.

3. Feature identification in satellite images

In this satellite image, you can see the terrain of a river valley. Various colors typically belong to different features. K-means clustering can be used to cluster them into groups, which can then be identified into various surface features like water and vegetation.

4. Tools to find dominant colors

There are two additional methods that you will be introduced to in this video, which are a part of the image class of matplotlib. The first one is the imread method, which converts a JPEG image into a matrix, which contains the RGB values of each pixel. The second method is the imshow method which would display colors of the cluster centers once you perform k-means clustering on the RGB values.

5. Test image

In this video, let us perform k-means clustering on this image of the sea. Notice that there are two dominant colors in the image - a blue-green color of the sea water, and a light blue sky.

6. Convert image to RGB matrix

The first step in the process is to convert the image to pixels using the imread method of the image class. Notice that the output of this function is a MxNx3 matrix, where M and N are the dimensions of the image. In this analysis, we are going to collectively look at all pixels, and their position would not matter, hence, we will just extract all RGB values and store them in their corresponding lists.

7. DataFrame with RGB values

Once the lists are created, we store them in a pandas DataFrame.

8. Create an elbow plot

Here is the code to create an elbow plot from the last chapter.

9. Elbow plot

Once we scale the RGB values of the list of pixels, we create the elbow plot to see how many dominant colors are present in the image. Notice that the elbow plot indicates two clusters, which supports our initial observation of two prominent colors in the image.

10. Find dominant colors

The cluster centers obtained are standardized RGB values. Recall that a standardized value of a variable is its actual value divided by the standard deviation. We would display the colors through the imshow method, which takes RGB values that have been scaled to the range of 0 to 1. To do so, we need to multiply the standardized values of the cluster centers with their corresponding standard deviations. We saw earlier that actual RGB values take the maximum value of 255, hence we divide it by 255 to get a scaled value in the range of 0-1.

11. Display dominant colors

Once we have the colors with their RGB values, the imshow method is used to display them. Note that you need to provide the colors variable encapsulated as a list, as the imshow method expects a MxNx3 matrix to display a 2D grid of colors. By doing this, we are providing a 1xNx3 matrix, which tells imshow method to display only one row of colors, where N is the number of clusters. Here are the two dominant colors, which supports our preliminary observations.

12. Next up: exercises

Let us now move on to exercises.