Handling outliers
In this exercise, you'll handle outliers - data points that are so different from the rest of your data, that you treat them differently from other "normal-looking" data points. You'll use the output from the previous exercise (percent change over time) to detect the outliers. First you will write a function that replaces outlier data points with the median value from the entire time series.
Diese Übung ist Teil des Kurses
Machine Learning for Time Series Data in Python
Anleitung zur Übung
- Define a function that takes an input series and does the following:
- Calculates the absolute value of each datapoint's distance from the series mean, then creates a boolean mask for datapoints that are three times the standard deviation from the mean.
- Use this boolean mask to replace the outliers with the median of the entire series.
- Apply this function to your data and visualize the results using the given code.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
def replace_outliers(series):
# Calculate the absolute difference of each timepoint from the series mean
absolute_differences_from_mean = np.abs(series - np.mean(series))
# Calculate a mask for the differences that are > 3 standard deviations from zero
this_mask = absolute_differences_from_mean > (np.____(series) * ____)
# Replace these values with the median accross the data
series[this_mask] = np.____(series)
return series
# Apply your preprocessing function to the timeseries and plot the results
prices_perc = prices_perc.____
prices_perc.loc["2014":"2015"].plot()
plt.show()