Computing the K-S statistic
Write a function to compute the Kolmogorov-Smirnov statistic from two datasets, data1
and data2
, in which data2
consists of samples from the theoretical distribution you are comparing your data to. Note that this means we are using hacker stats to compute the K-S statistic for a dataset and a theoretical distribution, not the K-S statistic for two empirical datasets. Conveniently, the function you just selected for computing values of the formal ECDF is given as dcst.ecdf_formal()
.
This exercise is part of the course
Case Studies in Statistical Thinking
Exercise instructions
- Compute the values of the convex corners of the formal ECDF for
data1
usingdcst.ecdf()
. Store the results in the variablesx
andy
. - Use
dcst.ecdf_formal()
to compute the values of the theoretical CDF, determined fromdata2
, at the convex cornersx
. Store the result in the variablecdf
. - Compute the distances between the concave corners of the formal ECDF and the theoretical CDF. Store the result as
D_top
. - Compute the distance between the convex corners of the formal ECDF and the theoretical CDF. Note that you will need to subtract
1/len(data1)
fromy
to get they
-value at the convex corner. Store the result inD_bottom
. - Return the K-S statistic as the maximum of all entries in
D_top
andD_bottom
. You can passD_top
andD_bottom
together as a tuple tonp.max()
to do this.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def ks_stat(data1, data2):
# Compute ECDF from data: x, y
# Compute corresponding values of the target CDF
cdf = ____
# Compute distances between concave corners and CDF
D_top = ____ - ____
# Compute distance between convex corners and CDF
D_bottom = ____ - ____ + ____/____
return np.max((D_top, D_bottom))