Tuning the window size
You want to check for yourself that the optimal window size for the arrhythmia dataset is 50. You have been given the dataset as a pandas
data frame called arrh
, and want to use a subset of the data up to time t_now
. Your test data is available as X_test
, y_test
. You will try out a number of window sizes, ranging from 10 to 100, fit a naive Bayes classifier to each window, assess its F1 score on the test data, and then pick the best performing window size. You also have numpy
available as np
, and the function f1_score()
has been imported already. Finally, an empty list called accuracies
has been initialized for you to store the accuracies of the windows.
This exercise is part of the course
Designing Machine Learning Workflows in Python
Exercise instructions
- Define the index of a sliding window of size
w_size
stopping att_now
using the.loc()
method. - Construct
X
from the sliding window by removing theclass
column. Store that latter column asy
. - Fit a naive Bayes classifier to
X
andy
, and use it to predict the labels of the test dataX_test
. - Compute the F1 score of these predictions for each window size, and find the best-performing window size.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Loop over window sizes
for w_size in wrange:
# Define sliding window
sliding = arrh.____[____:t_now]
# Extract X and y from the sliding window
X, y = sliding.____('class', ____), sliding[____]
# Fit the classifier and store the F1 score
preds = GaussianNB().fit(____, ____).____(X_test)
accuracies.append(____(____, ____))
# Estimate the best performing window size
optimal_window = ____[np.____(accuracies)]