Get startedGet started for free

Evaluating distribution fit for the ldl variable

In this exercise, you'll focus on one variable of the diabetes dataset dia: the ldl blood serum. You'll determine whether the normal distribution is a still good choice for ldl based on the additional information provided by a Kolmogorov-Smirnov test.

The dia DataFrame has been loaded for you. The following libraries have also been imported: pandas as pd, numpy as np, and scipy.stats as st.

This exercise is part of the course

Monte Carlo Simulations in Python

View Course

Exercise instructions

  • Define a list called list_of_dists containing your candidate distributions: Laplace, normal, and exponential (in that order); use the correct names from scipy.stats.
  • Inside the loop, fit the data with the corresponding probability distribution, saving as param.
  • Perform a Kolmogorov–Smirnov test to evaluate goodness-of-fit, saving the results as result.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# List candidate distributions to evaluate
list_of_dists = [____]
for i in list_of_dists:
    dist = getattr(st, i)
    # Fit the data to the probability distribution
    param = dist.____
    # Perform the ks test to evaluate goodness-of-fit
    result = ____
    print(result)
Edit and Run Code