Introduction to the Pokemon data
1. Introduction to the Pokemon data
So far you have applied what you have learned about the 'kmeans' algorithm to synthetic data. In this final set of exercises you will apply your learning to a 'real world' data set.2. "Real" data exercise
This data is about 800 Pokemon from the Pokemon Games (sorry, this isn't from Pokemon Go). This may not be dataset where you have already built intuition. This is normal in data science. You may want to gain some intuition about the data by researching Pokemon in the Pokemon games, or as we will have you do, by poking around in the data.3. The Pokemon dataset
The data was originally collected by Alberto Barradas and is hosted on Kaggle at the address on the screen. The data contains 6 features for each Pokemon: Hit Points, Attack, Defense, Special Attack, Special Defense, and Speed. This is unlabeled data because there is not a single outcome that we want to predict, just some measurements of each Pokemon's abilities. For the data curious, more information on Pokemon and these features can be found at the second address on the screen. Along with exploring the data, this is another way to build intuition about the data.4. Data challenges
In the next set of exercises, you will have multiple steps to complete that are typical in handling real world data. The first is determining which variables to use for clustering — it is important to consider which feature should be used in the clustering exercise. Sometimes trying multiple subsets of features is an important step to find patterns in the data. The next, and something we will delay to a later chapter, is scaling the data. If the features being used in modeling are of different units or scales, scaling the data to a common measure is often completed in order to improve the insights gained from unsupervised learning. In this example, you will be finding homogeneous subgroups of Pokemon. The number of clusters is not known beforehand so you will have to make a determination. In real world data, a nice clean elbow on the scree plot rarely exists, so as an analyst, you will have to use some judgement in this step. Finally, a common output of any analysis exercise is a visual representation of the outcomes. This can also be helpful to gain some additional intuition into the data and the resulting models.5. Let's practice!
This may seem like a lot, but we'll guide you through step-by-step, providing hints and templates all along the way. Let's practice.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.