Get startedGet started for free

Tuning the window size

You want to check for yourself that the optimal window size for the arrhythmia dataset is 50. You have been given the dataset as a pandas data frame called arrh, and want to use a subset of the data up to time t_now. Your test data is available as X_test, y_test. You will try out a number of window sizes, ranging from 10 to 100, fit a naive Bayes classifier to each window, assess its F1 score on the test data, and then pick the best performing window size. You also have numpy available as np, and the function f1_score() has been imported already. Finally, an empty list called accuracies has been initialized for you to store the accuracies of the windows.

This exercise is part of the course

Designing Machine Learning Workflows in Python

View Course

Exercise instructions

  • Define the index of a sliding window of size w_size stopping at t_now using the .loc() method.
  • Construct X from the sliding window by removing the class column. Store that latter column as y.
  • Fit a naive Bayes classifier to X and y, and use it to predict the labels of the test data X_test.
  • Compute the F1 score of these predictions for each window size, and find the best-performing window size.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Loop over window sizes
for w_size in wrange:

    # Define sliding window
    sliding = arrh.____[____:t_now]

    # Extract X and y from the sliding window
    X, y = sliding.____('class', ____), sliding[____]
    
    # Fit the classifier and store the F1 score
    preds = GaussianNB().fit(____, ____).____(X_test)
    accuracies.append(____(____, ____))

# Estimate the best performing window size
optimal_window = ____[np.____(accuracies)]
Edit and Run Code