Computing the value of a formal ECDF
To be able to do the Kolmogorov-Smirnov test, we need to compute the value of a formal ECDF at arbitrary points. In other words, we need a function, ecdf_formal(x, data)
that returns the value of the formal ECDF derived from the dataset data
for each value in the array x
. Two of the functions accomplish this. One will not. Of the two that do the calculation correctly, one is faster. Label each.
As a reminder, the ECDF is formally defined as ECDF(x) = (number of samples ≤ x) / (total number of samples). You also might want to check out the doc string of np.searchsorted()
.
a)
def ecdf_formal(x, data):
return np.searchsorted(np.sort(data), x) / len(data)
b)
def ecdf_formal(x, data):
return np.searchsorted(np.sort(data), x, side='right') / len(data)
c)
def ecdf_formal(x, data):
output = np.empty(len(x))
data = np.sort(data)
for i, x_val in x:
j = 0
while j < len(data) and x_val >= data[j]:
j += 1
output[i] = j
return output / len(data)
This exercise is part of the course
Case Studies in Statistical Thinking
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
