NMF learns topics of documents
In the video, you learned when NMF is applied to documents, the components correspond to topics of documents, and the NMF features reconstruct the documents from the topics. Verify this for yourself for the NMF model that you built earlier using the Wikipedia articles. Previously, you saw that the 3rd NMF feature value was high for the articles about actors Anne Hathaway and Denzel Washington. In this exercise, identify the topic of the corresponding NMF component.
The NMF model you built earlier is available as model
, while words
is a list of the words that label the columns of the word-frequency array.
After you are done, take a moment to recognize the topic that the articles about Anne Hathaway and Denzel Washington have in common!
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Import
pandas
aspd
. - Create a DataFrame
components_df
frommodel.components_
, settingcolumns=words
so that columns are labeled by the words. - Print
components_df.shape
to check the dimensions of the DataFrame. - Use the
.iloc[]
accessor on the DataFramecomponents_df
to select row3
. Assign the result tocomponent
. - Call the
.nlargest()
method ofcomponent
, and print the result. This gives the five words with the highest values for that component.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import pandas
import pandas as pd
# Create a DataFrame: components_df
components_df = ____
# Print the shape of the DataFrame
print(components_df.shape)
# Select row 3: component
component = ____
# Print result of nlargest
print(component.nlargest())