Handling outliers

In this exercise, you'll handle outliers - data points that are so different from the rest of your data, that you treat them differently from other "normal-looking" data points. You'll use the output from the previous exercise (percent change over time) to detect the outliers. First you will write a function that replaces outlier data points with the median value from the entire time series.

Cet exercice fait partie du cours

Machine Learning for Time Series Data in Python

Afficher le cours

Instructions

Define a function that takes an input series and does the following:
- Calculates the absolute value of each datapoint's distance from the series mean, then creates a boolean mask for datapoints that are three times the standard deviation from the mean.
- Use this boolean mask to replace the outliers with the median of the entire series.
Apply this function to your data and visualize the results using the given code.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

def replace_outliers(series):
    # Calculate the absolute difference of each timepoint from the series mean
    absolute_differences_from_mean = np.abs(series - np.mean(series))
    
    # Calculate a mask for the differences that are > 3 standard deviations from zero
    this_mask = absolute_differences_from_mean > (np.____(series) * ____)
    
    # Replace these values with the median accross the data
    series[this_mask] = np.____(series)
    return series

# Apply your preprocessing function to the timeseries and plot the results
prices_perc = prices_perc.____
prices_perc.loc["2014":"2015"].plot()
plt.show()

Modifier et exécuter le code