ComeçarComece de graça

Computing the K-S statistic

Write a function to compute the Kolmogorov-Smirnov statistic from two datasets, data1 and data2, in which data2 consists of samples from the theoretical distribution you are comparing your data to. Note that this means we are using hacker stats to compute the K-S statistic for a dataset and a theoretical distribution, not the K-S statistic for two empirical datasets. Conveniently, the function you just selected for computing values of the formal ECDF is given as dcst.ecdf_formal().

Este exercício faz parte do curso

Case Studies in Statistical Thinking

Ver curso

Instruções do exercício

  • Compute the values of the convex corners of the formal ECDF for data1 using dcst.ecdf(). Store the results in the variables x and y.
  • Use dcst.ecdf_formal() to compute the values of the theoretical CDF, determined from data2, at the convex corners x. Store the result in the variable cdf.
  • Compute the distances between the concave corners of the formal ECDF and the theoretical CDF. Store the result as D_top.
  • Compute the distance between the convex corners of the formal ECDF and the theoretical CDF. Note that you will need to subtract 1/len(data1) from y to get the y-value at the convex corner. Store the result in D_bottom.
  • Return the K-S statistic as the maximum of all entries in D_top and D_bottom. You can pass D_top and D_bottom together as a tuple to np.max() to do this.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

def ks_stat(data1, data2):
    # Compute ECDF from data: x, y
    
    
    # Compute corresponding values of the target CDF
    cdf = ____

    # Compute distances between concave corners and CDF
    D_top = ____ - ____

    # Compute distance between convex corners and CDF
    D_bottom = ____ - ____ + ____/____

    return np.max((D_top, D_bottom))
Editar e executar o código