Get startedGet started for free

Summarize the values in your time series data

1. Summarizing the values in your time series data

While displaying and annotating time series data is extremely helpful when sharing information, it is also critical that you collect summary statistics of any time series that you are working with. Doing so will allow you to share and discuss statistical properties of your data that can further support the plots that you generate and any hypotheses that you want to communicate.

2. Obtaining numerical summaries of your data

How many times have you found yourself in a situation where someone asks you "What is the average value of this data?", or "What is the maximum value observed in this time series?". Obtaining these numbers can be critical to understand the data you are working with, or to communicate the characteristics of your data to others.

3. Obtaining numerical summaries of your data

The dot describe() method in pandas enables you to obtain summary statistics of all numeric columns in a DataFrame. This is an extremely useful feature, as it allows you to quickly gain insight into broad statistics of your data. The method is smart enough to compute summary statistics for numerical columns only. It will return a number of relevant statistics including the number of observations in the column, the mean and standard deviations of its values, and various other percentile values.

4. Summarizing your data with boxplots

If getting point estimates of the numerical values in your data is not sufficient, you can also leverage the dot boxplot() method to visualize the distribution of your data. A boxplot provides information on the shape, variability, and median of your data. It is particularly useful to display the range of your data and for identifying any potential outliers.

5. A boxplot of the values in the CO2 data

The lines extending parallel from the boxes are commonly referred to as "whiskers", which are used to indicate variability outside the upper (which is the 75% percentile) and lower (which is the 25% percentile) quartiles, i.e. outliers. These outliers are usually plotted as individual dots that are in-line with whiskers.

6. Summarizing your data with histograms

Another method that can be used to produce visual summaries of the values of a column in a pandas DataFrame is by leveraging histogram plots. Histograms are a type of plot that allow you to inspect the underlying distribution of your data. These can sometimes be more useful than boxplots, as non-technical members of your team will often be more familiar with histograms, and therefore are more likely to quickly understand the shape of the data you are exploring or presenting to them. In pandas, it is possible to produce a histogram by simply using the standard dot plot() method and specifying the kind argument as hist. In addition, you can specify the bins parameter, which determines how many intervals you should cut your data into.

7. A histogram plot of the values in the CO2 data

There are no hard and fast rules to find the optimal number for the bins parameter, and this often needs to be found through trial and error.

8. Summarizing your data with density plots

Since it can be tedious to identify the optimal number of bins, histograms can be a cumbersome way to assess the distribution of your data. Instead, you can rely on kernel density plots to view the distribution of your data. Kernel density plots are a variation of histograms. They use kernel smoothing to plot the values of your data and allow for smoother distributions by dampening the effect of noise and outliers, while displaying where the mass of your data is located. It is simple to generate density plots with the pandas library, as you only need to use the standard dot plot() method while specifying the kind argument as density.

9. A density plot of the values in the CO2 data

And this is how a density plot looks like.

10. Let's practice!

Time for you to generate your own summary graphs!