Calculating on a pivot table
Pivot tables are filled with summary statistics, but they are only a first step to finding something insightful. Often you'll need to perform further calculations on them. A common thing to do is to find the rows or columns where the highest or lowest value occurs.
Recall from Chapter 1 that you can easily subset a Series or DataFrame to find rows of interest using a logical condition inside of square brackets. For example: series[series > value]
.
pandas
is loaded as pd
and the DataFrame temp_by_country_city_vs_year
is available.
The .head()
for this DataFrame is shown below, with only a few of the year columns displayed:
country | city | 2000 | 2001 | 2002 | … | 2013 |
---|---|---|---|---|---|---|
Afghanistan | Kabul | 15.823 | 15.848 | 15.715 | … | 16.206 |
Angola | Luanda | 24.410 | 24.427 | 24.791 | … | 24.554 |
Australia | Melbourne | 14.320 | 14.180 | 14.076 | … | 14.742 |
Sydney | 17.567 | 17.854 | 17.734 | … | 18.090 | |
Bangladesh | Dhaka | 25.905 | 25.931 | 26.095 | … | 26.587 |
This is a part of the course
“Data Manipulation with pandas”
Exercise instructions
- Calculate the mean temperature for each year, assigning to
mean_temp_by_year
. - Filter
mean_temp_by_year
for the year that had the highest mean temperature. - Calculate the mean temperature for each city (across columns), assigning to
mean_temp_by_city
. - Filter
mean_temp_by_city
for the city that had the lowest mean temperature.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get the worldwide mean temp by year
mean_temp_by_year = temp_by_country_city_vs_year.____
# Filter for the year that had the highest mean temp
print(mean_temp_by_year[____])
# Get the mean temp by city
mean_temp_by_city = temp_by_country_city_vs_year.____
# Filter for the city that had the lowest mean temp
print(mean_temp_by_city[____])
This exercise is part of the course
Data Manipulation with pandas
Learn how to import and clean data, calculate statistics, and create visualizations with pandas.
Indexes are supercharged row and column names. Learn how they can be combined with slicing for powerful DataFrame subsetting.
Exercise 1: Explicit indexesExercise 2: Setting and removing indexesExercise 3: Subsetting with .loc[]Exercise 4: Setting multi-level indexesExercise 5: Sorting by index valuesExercise 6: Slicing and subsetting with .loc and .ilocExercise 7: Slicing index valuesExercise 8: Slicing in both directionsExercise 9: Slicing time seriesExercise 10: Subsetting by row/column numberExercise 11: Working with pivot tablesExercise 12: Pivot temperature by city and yearExercise 13: Subsetting pivot tablesExercise 14: Calculating on a pivot tableWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.