Aan de slagGa gratis aan de slag

Tuning n_neighbors

n_neighbors is the most crucial parameter of KNN. When you are unsure about the number of outliers in the dataset (which happens often), you can't use the rule of thumb that suggests using 20 neighbors when contamination is below 10%.

For such cases, you'll have to tune n_neighbors. Practice the process on the transformed version of the females dataset from the last exercise. It has been loaded as females_transformed. KNN estimator, evaluate_outlier_classifier and evaluate_regressor functions are also loaded.

Here are the function bodies as reminders:

def evaluate_outlier_classifier(model, data, threshold=.75):
    model.fit(data)

    probs = model.predict_proba(data)
    inliers = data[probs[:, 1] <= threshold]

    return inliers

def evaluate_regressor(inliers):
    X, y = inliers.drop("weightkg", axis=1), inliers[['weightkg']]
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=10, train_size=0.8)

    lr = LinearRegression()
    lr.fit(X_train, y_train)

    preds = lr.predict(X_test)
    rmse = root_mean_squared_error(y_test, preds)

    return round(rmse, 3)

Deze oefening maakt deel uit van de cursus

Anomaly Detection in Python

Cursus bekijken

Oefeninstructies

  • Create a list of possible values for n_neighbors in that order: 5, 10, 20
  • Instantiate a KNN model, setting the value of n_neighbors to the current k in the loop.
  • Find the inliers using the evaluate_outlier_classifier function.
  • Calculate RMSE with evaluate_regressor and store the result into scores with k as the key and RMSE as the value.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create a list of values for n_neigbors
n_neighbors = [____, ____, ____]
scores = dict()

for k in n_neighbors:
    # Instantiate KNN with the current k
    knn = ____(____, n_jobs=-1)
    
    # Find the inliers with the current KNN
    inliers = ____(____, ____, .50)
    
    # Calculate and store RMSE into scores
    scores[____] = ____
    
print(scores)
Code bewerken en uitvoeren