El NMF identifica los temas de los documentos

En el vídeo has aprendido que, cuando se aplica el NMF a los documentos, los componentes se corresponden con los temas de los documentos, y que las características del NMF reconstruyen los documentos a partir de esos temas. Compruébalo tú mismo con el modelo NMF que creaste antes usando los artículos de Wikipedia. Antes viste que el tercer valor de la característica NMF era alto en los artículos sobre los actores Anne Hathaway y Denzel Washington. En este ejercicio, identifica el tema del componente NMF correspondiente.

El modelo NMF que creaste antes está disponible como model, mientras quewords es una lista de las palabras que etiquetan las columnas del arreglo de frecuencia de palabras.

Cuando hayas terminado, ¡tómate un momento para reconocer el tema que tienen en común los artículos sobre Anne Hathaway y Denzel Washington!

Este ejercicio forma parte del curso

Aprendizaje no supervisado en Python

Instrucciones del ejercicio

Importa pandas como pd.
Crea un DataFrame components_df a partir de model.components_, configurando columns=words para que las columnas estén etiquetadas por las palabras.
Imprime components_df.shape para comprobar las dimensiones del DataFrame.
Utiliza el accesorio .iloc[] en el DataFrame components_df para seleccionar la fila 3. Asigna el resultado a component.
Llama al método .nlargest() de component, e imprime el resultado. Así se obtienen las cinco palabras con los valores más altos para ese componente.

ejercicio interactivo práctico

Prueba este ejercicio completando este código de ejemplo.

# Import pandas
import pandas as pd

# Create a DataFrame: components_df
components_df = ____

# Print the shape of the DataFrame
print(components_df.shape)

# Select row 3: component
component = ____

# Print result of nlargest
print(component.nlargest())

Editar y ejecutar código

Este ejercicio forma parte del curso

Aprendizaje no supervisado en Python

IntermedioNivel de habilidad

4.8+

Empieza el curso gratis

Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.

Exercise 1: Unsupervised Learning Exercise 2: How many clusters?Exercise 3: Clustering 2D points Exercise 4: Inspect your clustering Exercise 5: Evaluating a clustering Exercise 6: How many clusters of grain?Exercise 7: Evaluating the grain clustering Exercise 8: Transforming features for better clusterings Exercise 9: Scaling fish data for clustering Exercise 10: Clustering the fish data Exercise 11: Clustering stocks using KMeans Exercise 12: Which stocks move together?

In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.

Exercise 1: Visualizing hierarchies Exercise 2: How many merges?Exercise 3: Hierarchical clustering of the grain data Exercise 4: Hierarchies of stocks Exercise 5: Cluster labels in hierarchical clustering Exercise 6: Which clusters are closest?Exercise 7: Different linkage, different hierarchical clustering!Exercise 8: Intermediate clusterings Exercise 9: Extracting the cluster labels Exercise 10: t-SNE for 2-dimensional maps Exercise 11: t-SNE visualization of grain dataset Exercise 12: A t-SNE map of the stock market

Dimension reduction summarizes a dataset using its common occuring patterns. In this chapter, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!

Exercise 1: Visualizing the PCA transformation Exercise 2: Correlated data in nature Exercise 3: Decorrelating the grain measurements with PCA Exercise 4: Principal components Exercise 5: Intrinsic dimension Exercise 6: The first principal component Exercise 7: Variance of the PCA features Exercise 8: Intrinsic dimension of the fish data Exercise 9: Dimension reduction with PCA Exercise 10: Dimension reduction of the fish measurements Exercise 11: A tf-idf word-frequency array Exercise 12: Clustering Wikipedia part I Exercise 13: Clustering Wikipedia part II

In this chapter, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!

Exercise 1: Factorización de matrices no negativas (NMF)Exercise 2: Datos no negativos Exercise 3: NMF aplicado a los artículos de Wikipedia Exercise 4: Características NMF de los artículos de Wikipedia Exercise 5: NMF reconstruye las muestras Exercise 6: El NMF aprende partes interpretables Exercise 7: El NMF identifica los temas de los documentos

Ejercicio actual

Exercise 8: Explora el conjunto de datos de dígitos LED Exercise 9: El NMF identifica las partes de las imágenes Exercise 10: PCA no aprende las piezas Exercise 11: Creación de sistemas de recomendación con NMF Exercise 12: ¿Qué artículos son similares a "Cristiano Ronaldo"?Exercise 13: Recomienda artistas musicales parte I Exercise 14: Recomendaciones de artistas musicales, parte II Exercise 15: Reflexiones finales