Computing the value of a formal ECDF
To be able to do the Kolmogorov-Smirnov test, we need to compute the value of a formal ECDF at arbitrary points. In other words, we need a function, ecdf_formal(x, data) that returns the value of the formal ECDF derived from the dataset data for each value in the array x. Two of the functions accomplish this. One will not. Of the two that do the calculation correctly, one is faster. Label each.
As a reminder, the ECDF is formally defined as ECDF(x) = (number of samples ≤ x) / (total number of samples). You also might want to check out the doc string of np.searchsorted().
a)
def ecdf_formal(x, data):
return np.searchsorted(np.sort(data), x) / len(data)
b)
def ecdf_formal(x, data):
return np.searchsorted(np.sort(data), x, side='right') / len(data)
c)
def ecdf_formal(x, data):
output = np.empty(len(x))
data = np.sort(data)
for i, x_val in x:
j = 0
while j < len(data) and x_val >= data[j]:
j += 1
output[i] = j
return output / len(data)
Cet exercice fait partie du cours
Case Studies in Statistical Thinking
Exercice interactif pratique
Passez de la théorie à la pratique avec l’un de nos exercices interactifs
Commencer l’exercice