Clustering k-means: primer ejercicio

Este ejercicio te ayudará a familiarizarte con el uso de k-means en un conjunto de datos. Vamos a usar el conjunto de datos de Comic Con y ver cómo funciona k-means sobre él.

Recuerda los dos pasos de k-means:

Definir los centros de los clústeres con la función kmeans(). Tiene dos argumentos obligatorios: las observaciones y el número de clústeres.
Asignar etiquetas de clúster con la función vq(). Tiene dos argumentos obligatorios: las observaciones y los centros de clúster.

Los datos están almacenados en un DataFrame de pandas, comic_con. x_scaled y y_scaled son los nombres de las columnas con las coordenadas X e Y estandarizadas de las personas en un momento dado.

Este ejercicio forma parte del curso

Análisis de clústeres en Python

Instrucciones del ejercicio

Importa las funciones kmeans y vq de SciPy.
Genera los centros de los clústeres usando la función kmeans() con dos clústeres.
Crea etiquetas de clúster usando estos centros.

ejercicio interactivo práctico

Prueba este ejercicio completando este código de ejemplo.

# Import the kmeans and vq functions
from ____.cluster.vq import ____, ____

# Generate cluster centers
cluster_centers, distortion = ____

# Assign cluster labels
comic_con['cluster_labels'], distortion_list = ____

# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', 
                hue='cluster_labels', data = comic_con)
plt.show()

Editar y ejecutar código

Este ejercicio forma parte del curso

Análisis de clústeres en Python

IntermedioNivel de habilidad

4.8+

Empieza el curso gratis

Before you are ready to classify news articles, you need to be introduced to the basics of clustering. This chapter familiarizes you with a class of machine learning algorithms called unsupervised learning and then introduces you to clustering, one of the popular unsupervised learning algorithms. You will know about two popular clustering techniques - hierarchical clustering and k-means clustering. The chapter concludes with basic pre-processing steps before you start clustering data.

Exercise 1: Unsupervised learning: basics Exercise 2: Unsupervised learning in real world Exercise 3: Pokémon sightings Exercise 4: Basics of cluster analysis Exercise 5: Pokémon sightings: hierarchical clustering Exercise 6: Pokémon sightings: k-means clustering Exercise 7: Data preparation for cluster analysis Exercise 8: Normalize basic list data Exercise 9: Visualize normalized data Exercise 10: Normalization of small numbers Exercise 11: FIFA 18: Normalize data

This chapter focuses on a popular clustering algorithm - hierarchical clustering - and its implementation in SciPy. In addition to the procedure to perform hierarchical clustering, it attempts to help you answer an important question - how many clusters are present in your data? The chapter concludes with a discussion on the limitations of hierarchical clustering and discusses considerations while using hierarchical clustering.

Exercise 1: Basics of hierarchical clustering Exercise 2: Hierarchical clustering: ward method Exercise 3: Hierarchical clustering: single method Exercise 4: Hierarchical clustering: complete method Exercise 5: Visualize clusters Exercise 6: Visualize clusters with matplotlib Exercise 7: Visualize clusters with seaborn Exercise 8: How many clusters?Exercise 9: Create a dendrogram Exercise 10: How many clusters in comic con data?Exercise 11: Limitations of hierarchical clustering Exercise 12: Timing run of hierarchical clustering Exercise 13: FIFA 18: exploring defenders

This chapter introduces a different clustering algorithm - k-means clustering - and its implementation in SciPy. K-means clustering overcomes the biggest drawback of hierarchical clustering that was discussed in the last chapter. As dendrograms are specific to hierarchical clustering, this chapter discusses one method to find the number of clusters before running k-means clustering. The chapter concludes with a discussion on the limitations of k-means clustering and discusses considerations while using this algorithm.

Exercise 1: Fundamentos del clustering k-means Exercise 2: Clustering k-means: primer ejercicio

Ejercicio actual

Exercise 3: Tiempo de ejecución de k-means clustering Exercise 4: ¿Cuántos clústeres?Exercise 5: Método del codo en clústeres bien definidos Exercise 6: Método del codo con datos uniformes Exercise 7: Limitaciones del clustering k-means Exercise 8: Impacto de las semillas en clusters distintos Exercise 9: Patrones de clustering uniformes Exercise 10: FIFA 18: defensas (revisión)

Now that you are familiar with two of the most popular clustering techniques, this chapter helps you apply this knowledge to real-world problems. The chapter first discusses the process of finding dominant colors in an image, before moving on to the problem discussed in the introduction - clustering of news articles. The chapter concludes with a discussion on clustering with multiple variables, which makes it difficult to visualize all the data.

Exercise 1: Dominant colors in images Exercise 2: Extract RGB values from image Exercise 3: How many dominant colors?Exercise 4: Display dominant colors Exercise 5: Document clustering Exercise 6: TF-IDF of movie plots Exercise 7: Top terms in movie clusters Exercise 8: Clustering with multiple features Exercise 9: Clustering with many features Exercise 10: Basic checks on clusters Exercise 11: FIFA 18: what makes a complete player?Exercise 12: Farewell!