Get startedGet started for free

Tuning the aggregation method

Once the optimal number of neighbors is found, it's time to tune the distance aggregating method. If n_neighbors is 10, each datapoint will have ten distance measurements to its nearest neighbors. KNN uses three methods to aggregate those distances: largest, mean, and median.

Find out which is best for the females_transformed dataset. KNN estimator, evaluate_outlier_classifier and evaluate_regressor functions are loaded for you.

Here are the function bodies as reminders:

def evaluate_outlier_classifier(model, data, threshold=.75):
    model.fit(data)

    probs = model.predict_proba(data)
    inliers = data[probs[:, 1] <= threshold]

    return inliers

def evaluate_regressor(inliers):
    X, y = inliers.drop("weightkg", axis=1), inliers[['weightkg']]
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=10, train_size=0.8)

    lr = LinearRegression()
    lr.fit(X_train, y_train)

    preds = lr.predict(X_test)
    rmse = mean_squared_error(y_test, preds, squared=False)

    return round(rmse, 3)

This exercise is part of the course

Anomaly Detection in Python

View Course

Exercise instructions

  • Loop over the product of n_neighbors and methods and instantiate KNN with temporary variables of k and m.
  • Find the inliers with the current KNN and a threshold of 50%.
  • Calculate RMSE and store the result into scores with k, m as key and RMSE as a value.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

n_neighbors = [5, 20]
methods = ['largest', 'mean', 'median']
scores = dict()

for k, m in ____:
    # Create a KNN instance
    knn = KNN(____, ____, n_jobs=-1)
    
    # Find the inliers with the current KNN
    inliers = ____

    # Calculate and store RMSE into scores
    scores[(k, m)] = ____
    
print(scores)
Edit and Run Code