Manipulating data
1. Manipulating data
In this lesson, you will manipulate and visualize your data!2. Groupby
In chapter 3, you used groupby() to calculate the mean for each partition of data. But what if you want to calculate more than one statistic?3. Groupby aggregate
Instead of writing multiple groupby statements, you can use the agg() method and pass a list of functions such as mean and max as shown here on the slide.4. Dummy variables
Although we haven't covered fitting machine learning models in this course, one thing to remember is that before fitting a model you need make sure the categorical variables in your data are recoded into dummy variables, that is they are recoded as integers. This is sometimes also called one-hot encoding. Here's an example dataframe with string and numeric columns.5. Get dummies
The get_dummies() function from Pandas will return a new DataFrame where the non-numerical columns will be encoded as dummy variables. As you can see here, the status column is recoded into two separate columns.6. Let's practice!
Time to work on the final set of exercises!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.