1. Fitting the Weibull model
In this video, we'll learn our first parametric model for survival analysis - the Weibull model!
2. Probability distributions
A probability distribution is a function that describes the probability of different outcomes. You may know the Normal distribution, which is commonly used to model sample means.
3. Probability distributions
...Or the Uniform distribution, which describes an equal probability of all outcomes.
4. Introducing the Weibull distribution
The Weibull distribution is another probability distribution. It's great at modeling the distribution of time-to-event. To describe it, we need 2 parameters - lambda and k.
5. Introducing the Weibull distribution
K is the shape parameter. Keeping lambda constant, k determines which shape the distribution takes on. Lambda is the scale parameter. Keeping k constant, lambda determines how tall and wide the distribution is.
6. Fitting the Weibull distribution to data
How does the Weibull distribution model time-to-event data? Imagine a company with a fleet of machines that are prone to failure. We plot each failed machine's time to failure as a histogram.
7. Fitting the Weibull distribution to data
Superimposing the Weibull distribution, it's a decent fit! Now we could use the model to make predictions along the entire time scale.
8. From Weibull distribution to survival function
Using the Weibull distribution parameters, we could derive the corresponding survival function. Keep in mind that k is often replaced with rho in survival functions, but they are the same parameter.
9. The knobs: k and lambda
k and lambda are key to understand how survival rate varies over time. Lambda has a straightforward interpretation: it is when 63.2% of the population has experienced the event.
Replacing k with a random value, say 3, we see that the function is proportional to a power of x, which is time in this case. And k is that power plus one. Therefore changing k changes the shape of the distribution.
10. Interpreting k (or $\rho$)
If k is smaller than 1, the power is negative. The survival curve takes on this shape, indicating that the event rate decreases over time.
11. Interpreting k (or $\rho$)
If k is equal to 1, the power is 0. This indicates that the event rate is constant over time.
12. Interpreting k (or $\rho$)
If k is greater than 1, the power is positive, which indicates that the event rate increases over time.
13. Survival analysis with Weibull distribution
To model time-to-event data with the Weibull model, we use the WeibullFitter class in lifelines.
First, we import the WeibullFitter class. We instantiate a class object by calling WeibullFitter. Then we call the dot-fit method with the durations column and observation column to fit the data. After calling dot-fit, we could access properties like survival_function_, lambda_, rho_, summary, and methods like dot-predict.
14. Example Weibull model
Let's apply this to mortgage_df, which is a right-censored DataFrame where the event of interest is a full payoff. We will import the WeibullFitter class and instantiate it. Then we fit the WeibullFitter class object with the durations and observation columns.
15. Example Weibull model
After fitting the model, let's plot the survival curve. Note that calling dot-plot on the WeibullFitter class plots its hazard function, which is a different type of survival curve. To plot the survival function, we could call the survival_function property and use dot-plot to plot it. Unlike the Kaplan-Meier model, the Weibull model is smooth and continuous.
Let's print the parameter values by calling the lambda property and rho property. Lambda is 6-point-11 and rho is 0-point-94. Rho is smaller than 1, meaning that mortgages that aren't paid off for a long time have an increasingly lower likelihood of getting paid off!
What's the survival rate, or probability that the mortgage is not paid off given they're 20 years old? Using the predict method with 20 as input, we get 5% as the answer!
16. Let's practice!
Let's practice!