Rolling 360-day median & std. deviation for nyc ozone data since 2000
The last video also showed you how to calculate several rolling statistics using the .agg()
method, similar to .groupby()
.
Let's take a closer look at the air quality history of NYC using the Ozone data you have seen before. The daily data are very volatile, so using a longer term rolling average can help reveal a longer term trend.
You'll be using a 360 day rolling window, and .agg()
to calculate the rolling mean and standard deviation for the daily average ozone values since 2000.
This exercise is part of the course
Manipulating Time Series Data in Python
Exercise instructions
We have already imported pandas
as pd
, and matplotlib.pyplot
as plt
.
- Use
pd.read_csv()
to import'ozone.csv'
, creating aDateTimeIndex
from the'date'
column usingparse_dates
andindex_col
, assign the result todata
, and drop missing values using.dropna()
. - Select the
'Ozone'
column and create a.rolling()
window using 360 periods, apply.agg()
to calculate themean
andstd
, and assign this torolling_stats
. - Use
.join()
to concatenatedata
withrolling_stats
, and assign tostats
. - Plot
stats
usingsubplots
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import and inspect ozone data here
data = ____
# Calculate the rolling mean and std here
rolling_stats = ____
# Join rolling_stats with ozone data
stats = ____
# Plot stats