Vérifications de base sur les clusters

Dans le jeu de données FIFA 18, nous nous sommes concentrés sur les défenseurs dans les exercices précédents. Essayons maintenant de nous intéresser aux attributs offensifs d’un joueur. La vitesse (pac), le dribble (dri) et le tir (sho) sont des caractéristiques que l’on retrouve chez les joueurs à vocation offensive. Dans cet exercice, un clustering k-means a déjà été appliqué aux données en utilisant les valeurs normalisées de ces trois attributs. Effectuez quelques vérifications de base sur les clusters ainsi obtenus.

Les données sont stockées dans un DataFrame pandas, fifa. Les noms des colonnes normalisées sont présents dans une liste scaled_features. Les étiquettes de cluster sont stockées dans la colonne cluster_labels. Rappelez-vous que les méthodes .count() et .mean() de pandas vous aident à obtenir le nombre d’observations et la moyenne des observations dans un DataFrame.

Cet exercice fait partie du cours

<cours>Analyse de clusters en Python</cours>

Instructions de l’exercice

Affichez la taille des clusters en regroupant selon la colonne cluster_labels.
Affichez la valeur moyenne des salaires des joueurs dans chaque cluster. eur_wage est le nom de la colonne qui contient le salaire d’un joueur en euros.

Exercice interactif pratique

Essayez cet exercice en complétant ce code d’exemple.

# Print the size of the clusters
print(fifa.____(____)['ID'].count())

# Print the mean value of wages in each cluster
print(fifa.____(____)['eur_wage'].____())

Modifier et exécuter le code

Cet exercice fait partie du cours

<cours>Analyse de clusters en Python</cours>

IntermédiaireNiveau de compétence

4.8+

Commencer le cours gratuitement

Before you are ready to classify news articles, you need to be introduced to the basics of clustering. This chapter familiarizes you with a class of machine learning algorithms called unsupervised learning and then introduces you to clustering, one of the popular unsupervised learning algorithms. You will know about two popular clustering techniques - hierarchical clustering and k-means clustering. The chapter concludes with basic pre-processing steps before you start clustering data.

Exercise 1: Unsupervised learning: basics Exercise 2: Unsupervised learning in real world Exercise 3: Pokémon sightings Exercise 4: Basics of cluster analysis Exercise 5: Pokémon sightings: hierarchical clustering Exercise 6: Pokémon sightings: k-means clustering Exercise 7: Data preparation for cluster analysis Exercise 8: Normalize basic list data Exercise 9: Visualize normalized data Exercise 10: Normalization of small numbers Exercise 11: FIFA 18: Normalize data

This chapter focuses on a popular clustering algorithm - hierarchical clustering - and its implementation in SciPy. In addition to the procedure to perform hierarchical clustering, it attempts to help you answer an important question - how many clusters are present in your data? The chapter concludes with a discussion on the limitations of hierarchical clustering and discusses considerations while using hierarchical clustering.

Exercise 1: Basics of hierarchical clustering Exercise 2: Hierarchical clustering: ward method Exercise 3: Hierarchical clustering: single method Exercise 4: Hierarchical clustering: complete method Exercise 5: Visualize clusters Exercise 6: Visualize clusters with matplotlib Exercise 7: Visualize clusters with seaborn Exercise 8: How many clusters?Exercise 9: Create a dendrogram Exercise 10: How many clusters in comic con data?Exercise 11: Limitations of hierarchical clustering Exercise 12: Timing run of hierarchical clustering Exercise 13: FIFA 18: exploring defenders

This chapter introduces a different clustering algorithm - k-means clustering - and its implementation in SciPy. K-means clustering overcomes the biggest drawback of hierarchical clustering that was discussed in the last chapter. As dendrograms are specific to hierarchical clustering, this chapter discusses one method to find the number of clusters before running k-means clustering. The chapter concludes with a discussion on the limitations of k-means clustering and discusses considerations while using this algorithm.

Exercise 1: Basics of k-means clustering Exercise 2: K-means clustering: first exercise Exercise 3: Runtime of k-means clustering Exercise 4: How many clusters?Exercise 5: Elbow method on distinct clusters Exercise 6: Elbow method on uniform data Exercise 7: Limitations of k-means clustering Exercise 8: Impact of seeds on distinct clusters Exercise 9: Uniform clustering patterns Exercise 10: FIFA 18: defenders revisited

Now that you are familiar with two of the most popular clustering techniques, this chapter helps you apply this knowledge to real-world problems. The chapter first discusses the process of finding dominant colors in an image, before moving on to the problem discussed in the introduction - clustering of news articles. The chapter concludes with a discussion on clustering with multiple variables, which makes it difficult to visualize all the data.

Exercise 1: Couleurs dominantes dans les images Exercise 2: Extraire les valeurs RVB d’une image Exercise 3: Combien de couleurs dominantes ?Exercise 4: Afficher les couleurs dominantes Exercise 5: Regroupement de documents Exercise 6: TF-IDF des intrigues de films Exercise 7: Termes principaux dans les clusters de films Exercise 8: Regrouper avec plusieurs variables Exercise 9: Regroupement avec de nombreuses caractéristiques Exercise 10: Vérifications de base sur les clusters

Exercice actuel

Exercise 11: FIFA 18 : qu’est-ce qui fait un joueur complet ?Exercise 12: Au revoir !