Non-negative matrix factorization (NMF)

1. Non-negative matrix factorization (NMF)

2. Non-negative matrix factorization

NMF stands for "non-negative matrix factorization". NMF, like PCA, is a dimension reduction technique. In constract to PCA, however, NMF models are interpretable. This means an NMF models are easier to understand yourself, and much easier for you to explain to others. NMF can not be applied to every dataset, however. It is required that the sample features be "non-negative", so greater than or equal to 0.

3. Interpretable parts

NMF achieves its interpretability by decomposing samples as sums of their parts. For example, NMF decomposes documents as combinations of common themes,

4. Interpretable parts

and images as combinations of common patterns. You'll learn about both these examples in detail later. For now, let's focus on getting started.

5. Using scikit-learn NMF

NMF is available in scikit learn, and follows the same fit/transform pattern as PCA. However, unlike PCA, the desired number of components must always be specified. NMF works both with numpy arrays and sparse arrays in the csr_matrix format.

6. Example word-frequency array

Let's see an application of NMF to a toy example of a word-frequency array. In this toy dataset, there are only 4 words in the vocabulary, and these correspond to the four columns of the word-frequency array. Each row represents a document, and the entries of the array measure the frequency of each word in the document using what's known as "tf-idf". "tf" is the frequency of the word in the document. So if 10% of the words in the document are "datacamp", then the tf of "datacamp" for that document is point-1. "idf" is a weighting scheme that reduces the influence of frequent words like "the".

7. Example usage of NMF

Let's now see how to use NMF in Python. Firstly, import NMF. Create a model, specifying the desired number of components. Let's specify 2. Fit the model to the samples, then use the fit model to perform the transformation.

8. NMF components

Just as PCA has principal components, NMF has components which it learns from the samples, and as with PCA, the dimension of the components is the same as the dimension of the samples. In our example, for instance, there are 2 components, and they live in 4 dimensional space, corresponding to the 4 words in the vocabulary. The entries of the NMF components are always non-negative.

9. NMF features

The NMF feature values are non-negative, as well. As we saw with PCA, our transformed data in this example will have two columns, corresponding to our two new features. The features and the components of an NMF model can be combined to approximately reconstruct the original data samples.

10. Reconstruction of a sample

Let's see how this works with a single data sample. Here is a sample representing a document from our toy dataset, and here are its NMF feature values. Now if we multiply each NMF components by the corresponding NMF feature value, and add up each column, we get something very close to the original sample.

11. Sample reconstruction

So a sample can be reconstructed by multiplying the NMF components by the NMF feature values of the sample, and adding up. This calculation also can be expressed as what is known as a product of matrices. We won't be using that point of view, but that's where the "matrix factorization", or "MF", in NMF comes from.

12. NMF fits to non-negative data only

Finally, remember that NMF can only be applied to arrays of non-negative data, such as word-frequency arrays. In the next video, you'll construct another example by encoding collections of images as non-negative arrays. There are many other great examples as well, such as arrays encoding audio spectrograms, and arrays representing the purchase histories on e-Commerce sites.

13. Let's practice!

In this video, you've learned the basics of NMF. Now let's practice using it.

This exercise is part of the course

Unsupervised Learning in Python

IntermediateSkill Level

4.5+

Start Course for Free

Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.

Exercise 1: Unsupervised Learning Exercise 2: How many clusters?Exercise 3: Clustering 2D points Exercise 4: Inspect your clustering Exercise 5: Evaluating a clustering Exercise 6: How many clusters of grain?Exercise 7: Evaluating the grain clustering Exercise 8: Transforming features for better clusterings Exercise 9: Scaling fish data for clustering Exercise 10: Clustering the fish data Exercise 11: Clustering stocks using KMeans Exercise 12: Which stocks move together?

In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.

Exercise 1: Visualizing hierarchies Exercise 2: How many merges?Exercise 3: Hierarchical clustering of the grain data Exercise 4: Hierarchies of stocks Exercise 5: Cluster labels in hierarchical clustering Exercise 6: Which clusters are closest?Exercise 7: Different linkage, different hierarchical clustering!Exercise 8: Intermediate clusterings Exercise 9: Extracting the cluster labels Exercise 10: t-SNE for 2-dimensional maps Exercise 11: t-SNE visualization of grain dataset Exercise 12: A t-SNE map of the stock market

Dimension reduction summarizes a dataset using its common occuring patterns. In this chapter, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!

Exercise 1: Visualizing the PCA transformation Exercise 2: Correlated data in nature Exercise 3: Decorrelating the grain measurements with PCA Exercise 4: Principal components Exercise 5: Intrinsic dimension Exercise 6: The first principal component Exercise 7: Variance of the PCA features Exercise 8: Intrinsic dimension of the fish data Exercise 9: Dimension reduction with PCA Exercise 10: Dimension reduction of the fish measurements Exercise 11: A tf-idf word-frequency array Exercise 12: Clustering Wikipedia part I Exercise 13: Clustering Wikipedia part II

In this chapter, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!

Exercise 1: Non-negative matrix factorization (NMF)

Current Exercise

Exercise 2: Non-negative data Exercise 3: NMF applied to Wikipedia articles Exercise 4: NMF features of the Wikipedia articles Exercise 5: NMF reconstructs samples Exercise 6: NMF learns interpretable parts Exercise 7: NMF learns topics of documents Exercise 8: Explore the LED digits dataset Exercise 9: NMF learns the parts of images Exercise 10: PCA doesn't learn parts Exercise 11: Building recommender systems using NMF Exercise 12: Which articles are similar to 'Cristiano Ronaldo'?Exercise 13: Recommend musical artists part I Exercise 14: Recommend musical artists part II Exercise 15: Final thoughts