Get startedGet started for free

Manipulating data

1. Manipulating data

In this lesson, you will manipulate and visualize your data!

2. Groupby

In chapter 3, you used groupby() to calculate the mean for each partition of data. But what if you want to calculate more than one statistic?

3. Groupby aggregate

Instead of writing multiple groupby statements, you can use the agg() method and pass a list of functions such as mean and max as shown here on the slide.

4. Dummy variables

Although we haven't covered fitting machine learning models in this course, one thing to remember is that before fitting a model you need make sure the categorical variables in your data are recoded into dummy variables, that is they are recoded as integers. This is sometimes also called one-hot encoding. Here's an example dataframe with string and numeric columns.

5. Get dummies

The get_dummies() function from Pandas will return a new DataFrame where the non-numerical columns will be encoded as dummy variables. As you can see here, the status column is recoded into two separate columns.

6. Let's practice!

Time to work on the final set of exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.