IniziaInizia gratis

Detecting data drift using the Kolmogorov-Smirnov test

After successfully deploying your heart disease prediction model, you've been monitoring its performance and input data. You've noticed that the distribution of some key features in the recent data collected in February looks a bit different from the data you trained on in January. Such discrepancies can affect the model's performance, and it's crucial to detect and address them.

In this exercise, you will use the Kolmogorov-Smirnov (K-S) test to detect any potential data drift between the January and February datasets. Sample datasets called january_data and february_data are already loaded for you.

Questo esercizio fa parte del corso

End-to-End Machine Learning

Visualizza il corso

Istruzioni dell'esercizio

  • Import the ks_2samp function from the scipy.stats module.
  • Use the provided sample datasets january_data and february_data to perform the Kolmogorov-Smirnov test; calculate the test statistic and p-value.
  • Check if the p-value is less than 0.05, indicating data drift; if data drift is detected, print "Data drift detected.", otherwise, print "No data drift detected."

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Import the ks_2samp function
from ____.____ import ____

# Calculate the test statistic and p value
test_statistic, p_value = ____(____, ____)

# Check the p-value and print the detection result
if ____:
	print("Data drift detected.")
else:
	print("No data drift detected.")
Modifica ed esegui il codice