Tuning n_neighbors

n_neighbors is the most crucial parameter of KNN. When you are unsure about the number of outliers in the dataset (which happens often), you can't use the rule of thumb that suggests using 20 neighbors when contamination is below 10%.

For such cases, you'll have to tune n_neighbors. Practice the process on the transformed version of the females dataset from the last exercise. It has been loaded as females_transformed. KNN estimator, evaluate_outlier_classifier and evaluate_regressor functions are also loaded.

Here are the function bodies as reminders:

def evaluate_outlier_classifier(model, data, threshold=.75):
    model.fit(data)

    probs = model.predict_proba(data)
    inliers = data[probs[:, 1] <= threshold]

    return inliers

def evaluate_regressor(inliers):
    X, y = inliers.drop("weightkg", axis=1), inliers[['weightkg']]
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=10, train_size=0.8)

    lr = LinearRegression()
    lr.fit(X_train, y_train)

    preds = lr.predict(X_test)
    rmse = root_mean_squared_error(y_test, preds)

    return round(rmse, 3)

Questo esercizio fa parte del corso

Anomaly Detection in Python

Visualizza il corso

Istruzioni dell'esercizio

Create a list of possible values for n_neighbors in that order: 5, 10, 20
Instantiate a KNN model, setting the value of n_neighbors to the current k in the loop.
Find the inliers using the evaluate_outlier_classifier function.
Calculate RMSE with evaluate_regressor and store the result into scores with k as the key and RMSE as the value.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Create a list of values for n_neigbors
n_neighbors = [____, ____, ____]
scores = dict()

for k in n_neighbors:
    # Instantiate KNN with the current k
    knn = ____(____, n_jobs=-1)
    
    # Find the inliers with the current KNN
    inliers = ____(____, ____, .50)
    
    # Calculate and store RMSE into scores
    scores[____] = ____
    
print(scores)

Modifica ed esegui il codice