Which articles are similar to 'Cristiano Ronaldo'?
In the video, you learned how to use NMF features and the cosine similarity to find similar articles.
Apply this to your NMF model for popular Wikipedia articles, by finding the articles most similar to the article about the footballer Cristiano Ronaldo. The NMF features you obtained earlier are available as nmf_features
, while titles
is a list of the article titles.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Import
normalize
fromsklearn.preprocessing
. - Apply the
normalize()
function tonmf_features
. Store the result asnorm_features
. - Create a DataFrame
df
fromnorm_features
, usingtitles
as an index. - Use the
.loc[]
accessor ofdf
to select the row of'Cristiano Ronaldo'
. Assign the result toarticle
. - Apply the
.dot()
method ofdf
toarticle
to calculate the cosine similarity of every row witharticle
. - Print the result of the
.nlargest()
method ofsimilarities
to display the most similar articles. This has been done for you, so hit 'Submit Answer' to see the result!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Perform the necessary imports
import pandas as pd
from ____ import ____
# Normalize the NMF features: norm_features
norm_features = ____
# Create a DataFrame: df
df = ____
# Select the row corresponding to 'Cristiano Ronaldo': article
article = df.loc[____]
# Compute the dot products: similarities
similarities = ____
# Display those with the largest cosine similarity
print(similarities.nlargest())