Computing the K-S statistic

Write a function to compute the Kolmogorov-Smirnov statistic from two datasets, data1 and data2, in which data2 consists of samples from the theoretical distribution you are comparing your data to. Note that this means we are using hacker stats to compute the K-S statistic for a dataset and a theoretical distribution, not the K-S statistic for two empirical datasets. Conveniently, the function you just selected for computing values of the formal ECDF is given as dcst.ecdf_formal().

This exercise is part of the course

Case Studies in Statistical Thinking

View Course

Exercise instructions

  • Compute the values of the convex corners of the formal ECDF for data1 using dcst.ecdf(). Store the results in the variables x and y.
  • Use dcst.ecdf_formal() to compute the values of the theoretical CDF, determined from data2, at the convex corners x. Store the result in the variable cdf.
  • Compute the distances between the concave corners of the formal ECDF and the theoretical CDF. Store the result as D_top.
  • Compute the distance between the convex corners of the formal ECDF and the theoretical CDF. Note that you will need to subtract 1/len(data1) from y to get the y-value at the convex corner. Store the result in D_bottom.
  • Return the K-S statistic as the maximum of all entries in D_top and D_bottom. You can pass D_top and D_bottom together as a tuple to np.max() to do this.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

def ks_stat(data1, data2):
    # Compute ECDF from data: x, y
    
    
    # Compute corresponding values of the target CDF
    cdf = ____

    # Compute distances between concave corners and CDF
    D_top = ____ - ____

    # Compute distance between convex corners and CDF
    D_bottom = ____ - ____ + ____/____

    return np.max((D_top, D_bottom))