Get startedGet started for free

Clean your time series data

1. Clean your time series data

Just like any other tasks in Data Science, it is a good practice to perform some investigatory analysis of your time series data before proceeding to more sophisticated tasks. In this Chapter, we will discuss how to explore and clean your data in more depth, and how to provide statistical summaries of your time series data.

2. The CO2 level time series

This chapter introduces a new dataset that is famous within the time series community. This time series dataset contains the CO2 measurements at the Mauna Loa Observatory, Hawaii between the years of 1958 and 2001.

3. Finding missing values in a DataFrame

In real life situations, data can often come in messy and/or noisy formats. "Noise" in data can include things such as outliers, misformatted data points and missing values. In order to be able to perform adequate analysis of your data, it is important to carefully process and clean your data. While this may seem like it will slow down your analysis initially, this investment is critical for future development, and can really help speed up your investigative analysis. The first step to achieve this goal is to check your data for missing values. In pandas , missing values in a DataFrame can be found with the dot isnull() method. Inversely, rows with non-null values can be found with the dot notnull() method. In both cases, these methods return True/False values of where non-missing and missing values are located.

4. Counting missing values in a DataFrame

If you are interested in finding how many rows contain missing values, you can combine the dot isnull() method with the dot sum() method to count the total number of missing values in each of the columns of the df DataFrame. This works because df dot isnull() returns the value True if a row value is null, and dot sum() returns the total number of missing rows.

5. Replacing missing values in a DataFrame

If you do not handle missing values in time series data, then these will show up as "empty" gaps in your graph. Therefore, it is often preferable to impute them with a numerical value. We can typically replace missing values with the mean value of the time series, the value from the preceding timepoint, or the value from the timepoint after. In order to replace missing values in your time series data, you can use the dot fillna() method in pandas shown in line 2. It is important to notice the method argument, which specifies how we want to deal with our missing data. Using the method bfill (i.e backfilling) will ensure that missing values are replaced by the next valid observation. On the other hand, ffill (i.e. forward- filling) will replace the missing values with the most recent non-missing value. Here, we used the bfill method, which means that the value for the date 1958-05-10 was "backfilled" with the value from the date 1958-05-17.

6. Let's practice!

Now let's go deal with those missing values ourselves!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.