Get startedGet started for free

Tuning n_neighbors

n_neighbors is the most crucial parameter of KNN. When you are unsure about the number of outliers in the dataset (which happens often), you can't use the rule of thumb that suggests using 20 neighbors when contamination is below 10%.

For such cases, you'll have to tune n_neighbors. Practice the process on the transformed version of the females dataset from the last exercise. It has been loaded as females_transformed. KNN estimator, evaluate_outlier_classifier and evaluate_regressor functions are also loaded.

Here are the function bodies as reminders:

def evaluate_outlier_classifier(model, data, threshold=.75):
    model.fit(data)

    probs = model.predict_proba(data)
    inliers = data[probs[:, 1] <= threshold]

    return inliers

def evaluate_regressor(inliers):
    X, y = inliers.drop("weightkg", axis=1), inliers[['weightkg']]
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=10, train_size=0.8)

    lr = LinearRegression()
    lr.fit(X_train, y_train)

    preds = lr.predict(X_test)
    rmse = mean_squared_error(y_test, preds, squared=False)

    return round(rmse, 3)

This exercise is part of the course

Anomaly Detection in Python

View Course

Exercise instructions

  • Create a list of possible values for n_neighbors in that order: 5, 10, 20
  • Instantiate a KNN model, setting the value of n_neighbors to the current k in the loop.
  • Find the inliers using the evaluate_outlier_classifier function.
  • Calculate RMSE with evaluate_regressor and store the result into scores with k as the key and RMSE as the value.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a list of values for n_neigbors
n_neighbors = [____, ____, ____]
scores = dict()

for k in n_neighbors:
    # Instantiate KNN with the current k
    knn = ____(____, n_jobs=-1)
    
    # Find the inliers with the current KNN
    inliers = ____(____, ____, .50)
    
    # Calculate and store RMSE into scores
    scores[____] = ____
    
print(scores)
Edit and Run Code