Observations de Pokémon : clustering hiérarchique

Nous allons poursuivre l’enquête sur les observations de Pokémon légendaires de l’exercice précédent. Rappelez-vous que, dans le nuage de points de l’exercice précédent, vous avez repéré deux zones où les observations de Pokémon étaient denses. Cela signifie que les points semblent se séparer en deux groupes. Dans cet exercice, vous allez former deux clusters d’observations à l’aide d’un clustering hiérarchique.

'x' et 'y' sont des colonnes contenant les coordonnées X et Y des lieux d’observation, stockées dans un DataFrame pandas, df. Vous avez à disposition : matplotlib.pyplot sous plt, seaborn sous sns, et pandas sous pd.

Cet exercice fait partie du cours

<cours>Analyse de clusters en Python</cours>

Instructions de l’exercice

Importez les bibliothèques linkage et fcluster.
Utilisez la fonction linkage() pour calculer les distances avec la méthode ward.
Générez les étiquettes de cluster pour chaque point de données avec deux clusters en utilisant la fonction fcluster().
Tracez les points avec seaborn et attribuez une couleur différente à chaque cluster.

Exercice interactif pratique

Essayez cet exercice en complétant ce code d’exemple.

# Import linkage and fcluster functions
from scipy.cluster.hierarchy import ____, ____

# Use the linkage() function to compute distance
Z = ____(____, 'ward')

# Generate cluster labels
df['cluster_labels'] = ____(____, ____, criterion='maxclust')

# Plot the points with seaborn
sns.scatterplot(x=____, y=____, hue=____, data=df)
plt.show()

Modifier et exécuter le code

Cet exercice fait partie du cours

<cours>Analyse de clusters en Python</cours>

IntermédiaireNiveau de compétence

4.8+

Commencer le cours gratuitement

Before you are ready to classify news articles, you need to be introduced to the basics of clustering. This chapter familiarizes you with a class of machine learning algorithms called unsupervised learning and then introduces you to clustering, one of the popular unsupervised learning algorithms. You will know about two popular clustering techniques - hierarchical clustering and k-means clustering. The chapter concludes with basic pre-processing steps before you start clustering data.

Exercise 1: Machine Learning non supervisé : notions de base Exercise 2: Unsupervised learning dans le monde réel Exercise 3: Observations de Pokémon Exercise 4: Bases de l’analyse de clusters Exercise 5: Observations de Pokémon : clustering hiérarchique

Exercice actuel

Exercise 6: Observations de Pokémon : clustering k-means Exercise 7: Préparation des données pour l’analyse de clusters Exercise 8: Normaliser des données de base sous forme de liste Exercise 9: Visualiser des données normalisées Exercise 10: Normalisation de petits nombres Exercise 11: FIFA 18 : Normaliser les données

This chapter focuses on a popular clustering algorithm - hierarchical clustering - and its implementation in SciPy. In addition to the procedure to perform hierarchical clustering, it attempts to help you answer an important question - how many clusters are present in your data? The chapter concludes with a discussion on the limitations of hierarchical clustering and discusses considerations while using hierarchical clustering.

Exercise 1: Basics of hierarchical clustering Exercise 2: Hierarchical clustering: ward method Exercise 3: Hierarchical clustering: single method Exercise 4: Hierarchical clustering: complete method Exercise 5: Visualize clusters Exercise 6: Visualize clusters with matplotlib Exercise 7: Visualize clusters with seaborn Exercise 8: How many clusters?Exercise 9: Create a dendrogram Exercise 10: How many clusters in comic con data?Exercise 11: Limitations of hierarchical clustering Exercise 12: Timing run of hierarchical clustering Exercise 13: FIFA 18: exploring defenders

This chapter introduces a different clustering algorithm - k-means clustering - and its implementation in SciPy. K-means clustering overcomes the biggest drawback of hierarchical clustering that was discussed in the last chapter. As dendrograms are specific to hierarchical clustering, this chapter discusses one method to find the number of clusters before running k-means clustering. The chapter concludes with a discussion on the limitations of k-means clustering and discusses considerations while using this algorithm.

Exercise 1: Basics of k-means clustering Exercise 2: K-means clustering: first exercise Exercise 3: Runtime of k-means clustering Exercise 4: How many clusters?Exercise 5: Elbow method on distinct clusters Exercise 6: Elbow method on uniform data Exercise 7: Limitations of k-means clustering Exercise 8: Impact of seeds on distinct clusters Exercise 9: Uniform clustering patterns Exercise 10: FIFA 18: defenders revisited

Now that you are familiar with two of the most popular clustering techniques, this chapter helps you apply this knowledge to real-world problems. The chapter first discusses the process of finding dominant colors in an image, before moving on to the problem discussed in the introduction - clustering of news articles. The chapter concludes with a discussion on clustering with multiple variables, which makes it difficult to visualize all the data.

Exercise 1: Dominant colors in images Exercise 2: Extract RGB values from image Exercise 3: How many dominant colors?Exercise 4: Display dominant colors Exercise 5: Document clustering Exercise 6: TF-IDF of movie plots Exercise 7: Top terms in movie clusters Exercise 8: Clustering with multiple features Exercise 9: Clustering with many features Exercise 10: Basic checks on clusters Exercise 11: FIFA 18: what makes a complete player?Exercise 12: Farewell!