Aan de slagGa gratis aan de slag

Handling outliers

In this exercise, you'll handle outliers - data points that are so different from the rest of your data, that you treat them differently from other "normal-looking" data points. You'll use the output from the previous exercise (percent change over time) to detect the outliers. First you will write a function that replaces outlier data points with the median value from the entire time series.

Deze oefening maakt deel uit van de cursus

Machine Learning for Time Series Data in Python

Cursus bekijken

Oefeninstructies

  • Define a function that takes an input series and does the following:
    • Calculates the absolute value of each datapoint's distance from the series mean, then creates a boolean mask for datapoints that are three times the standard deviation from the mean.
    • Use this boolean mask to replace the outliers with the median of the entire series.
  • Apply this function to your data and visualize the results using the given code.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

def replace_outliers(series):
    # Calculate the absolute difference of each timepoint from the series mean
    absolute_differences_from_mean = np.abs(series - np.mean(series))
    
    # Calculate a mask for the differences that are > 3 standard deviations from zero
    this_mask = absolute_differences_from_mean > (np.____(series) * ____)
    
    # Replace these values with the median accross the data
    series[this_mask] = np.____(series)
    return series

# Apply your preprocessing function to the timeseries and plot the results
prices_perc = prices_perc.____
prices_perc.loc["2014":"2015"].plot()
plt.show()
Code bewerken en uitvoeren