Get startedGet started for free

Handling outliers

In this exercise, you'll handle outliers - data points that are so different from the rest of your data, that you treat them differently from other "normal-looking" data points. You'll use the output from the previous exercise (percent change over time) to detect the outliers. First you will write a function that replaces outlier data points with the median value from the entire time series.

This exercise is part of the course

Machine Learning for Time Series Data in Python

View Course

Exercise instructions

  • Define a function that takes an input series and does the following:
    • Calculates the absolute value of each datapoint's distance from the series mean, then creates a boolean mask for datapoints that are three times the standard deviation from the mean.
    • Use this boolean mask to replace the outliers with the median of the entire series.
  • Apply this function to your data and visualize the results using the given code.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

def replace_outliers(series):
    # Calculate the absolute difference of each timepoint from the series mean
    absolute_differences_from_mean = np.abs(series - np.mean(series))
    
    # Calculate a mask for the differences that are > 3 standard deviations from zero
    this_mask = absolute_differences_from_mean > (np.____(series) * ____)
    
    # Replace these values with the median accross the data
    series[this_mask] = np.____(series)
    return series

# Apply your preprocessing function to the timeseries and plot the results
prices_perc = prices_perc.____
prices_perc.loc["2014":"2015"].plot()
plt.show()
Edit and Run Code