NMF features of the Wikipedia articles
Now you will explore the NMF features you created in the previous exercise. A solution to the previous exercise has been pre-loaded, so the array nmf_features
is available. Also available is a list titles
giving the title of each Wikipedia article.
When investigating the features, notice that for both actors, the NMF feature 3 has by far the highest value. This means that both articles are reconstructed using mainly the 3rd NMF component. In the next video, you'll see why: NMF components represent topics (for instance, acting!).
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Import
pandas
aspd
. - Create a DataFrame
df
fromnmf_features
usingpd.DataFrame()
. Set the index totitles
usingindex=titles
. - Use the
.loc[]
accessor ofdf
to select the row with title'Anne Hathaway'
, and print the result. These are the NMF features for the article about the actress Anne Hathaway. - Repeat the last step for
'Denzel Washington'
(another actor).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import pandas
____
# Create a pandas DataFrame: df
df = ____
# Print the row for 'Anne Hathaway'
print(____)
# Print the row for 'Denzel Washington'
print(____)