Computing the K-S statistic
Write a function to compute the Kolmogorov-Smirnov statistic from two datasets, data1 and data2, in which data2 consists of samples from the theoretical distribution you are comparing your data to. Note that this means we are using hacker stats to compute the K-S statistic for a dataset and a theoretical distribution, not the K-S statistic for two empirical datasets. Conveniently, the function you just selected for computing values of the formal ECDF is given as dcst.ecdf_formal().
This exercise is part of the course
Case Studies in Statistical Thinking
Exercise instructions
- Compute the values of the convex corners of the formal ECDF for
data1usingdcst.ecdf(). Store the results in the variablesxandy. - Use
dcst.ecdf_formal()to compute the values of the theoretical CDF, determined fromdata2, at the convex cornersx. Store the result in the variablecdf. - Compute the distances between the concave corners of the formal ECDF and the theoretical CDF. Store the result as
D_top. - Compute the distance between the convex corners of the formal ECDF and the theoretical CDF. Note that you will need to subtract
1/len(data1)fromyto get they-value at the convex corner. Store the result inD_bottom. - Return the K-S statistic as the maximum of all entries in
D_topandD_bottom. You can passD_topandD_bottomtogether as a tuple tonp.max()to do this.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def ks_stat(data1, data2):
# Compute ECDF from data: x, y
# Compute corresponding values of the target CDF
cdf = ____
# Compute distances between concave corners and CDF
D_top = ____ - ____
# Compute distance between convex corners and CDF
D_bottom = ____ - ____ + ____/____
return np.max((D_top, D_bottom))