Evaluating distribution fit for the ldl variable
In this exercise, you'll focus on one variable of the diabetes dataset dia
: the ldl
blood serum. You'll determine whether the normal distribution is a still good choice for ldl
based on the additional information provided by a Kolmogorov-Smirnov test.
The dia
DataFrame has been loaded for you. The following libraries have also been imported: pandas
as pd
, numpy
as np
, and scipy.stats
as st
.
This exercise is part of the course
Monte Carlo Simulations in Python
Exercise instructions
- Define a list called
list_of_dists
containing your candidate distributions: Laplace, normal, and exponential (in that order); use the correct names fromscipy.stats
. - Inside the loop, fit the data with the corresponding probability distribution, saving as
param
. - Perform a Kolmogorov–Smirnov test to evaluate goodness-of-fit, saving the results as
result
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# List candidate distributions to evaluate
list_of_dists = [____]
for i in list_of_dists:
dist = getattr(st, i)
# Fit the data to the probability distribution
param = dist.____
# Perform the ks test to evaluate goodness-of-fit
result = ____
print(result)