Get startedGet started for free

Fitting a Kaplan-Meier estimator

1. Fitting a Kaplan-Meier estimator

The Kaplan-Meier estimator is one of the most widely used methods for estimating survival functions.

2. What is the Kaplan-Meier estimator?

Also known as the product-limit estimator and KM-estimator, the Kaplan-Meier estimator computes survival probabilities and estimates the survival function. As a non-parametric method, it does not assume the underlying survival distribution has specific parameters and constructs the survival curve solely from collected data.

3. The mathematical intuition

Consider events along a timeline t. At each duration time t_i, we could measure the number of events that happened at t_i and the number of individuals that survived up to t_i. We call them d_i and n_i, respectively. d_i divided by n_i is the percentage chance of an event happening at t_i, and 1 minus that is the percentage chance of survival at t_i. Multiplying the results from any set of times, we obtain the probability of an individual surviving all these times. Despite how it looks, the math behind the Kaplan-Meier estimator is built on the Product Rule of Probability and fairly straightforward.

4. Why is it called the product-limit estimator?

To illustrate this, suppose we have a dataset with 3 durations: 1, 2, and 3. For t equals 2, the survival rate is the product of the percentage chance of an individual surviving at 1 and that of an individual surviving at 2. For t equals 3, the survival rate is the percentage chance of an individual surviving 1, 2, and 3, which is the survival rate at t equals 2, times the percentage chance of an individual surviving at 3. This intuitive property is behind the name "the Product-Limit estimator". The survival rate at time t equals the product of the percentage chance of surviving at time t and each prior time.

5. Assumptions to keep in mind

Keep in mind that when we use the Kaplan-Meier estimator, the data must satisfy these assumptions. The events are unambiguously defined. Subjects have the same survival probabilities regardless of when they enter the study. And censorship is non-informative for an individual's survival probability.

6. Kaplan-Meier estimator with lifelines

To model data with the Kaplan-Meier estimator, first, we run from lifelines import KaplanMeierFitter. Then let's instantiate a KaplanMeierFitter class and call it kmf. We could then run dot-fit on kmf with the durations and the censorship data. These values may be column references to a DataFrame, NumPy series, or lists.

7. The mortgage problem example

For example, let's revisit the mortgage problem. We are measuring time till payoff for mortgages. How do we construct a survival function using the Kaplan-Meier estimator?

8. The mortgage problem example

After importing KaplanMeierFitter from lifelines, we instantiate a KaplanMeierFitter class. We then fit the duration column to the durations parameter and the paid_off column to the event_observed parameter. The console output will specify that a KM-estimator has been fitted with the number of total observations and the number of censored subjects.

9. Using the Kaplan-Meier estimator

The fitted KaplanMeierFitter class contains valuable information. Dot-median_survival_time_ returns the median survival time of the survival function. To see all survival probabilities, dot-survival_function returns a table of survival probabilities at each time.

10. Using the Kaplan-Meier estimator

To calculate the survival probability for a specific time, we could use dot-predict.

11. Benefits and limitations

The Kaplan-Meier estimator has many benefits, including being intuitive, simple, and flexible to use on any time-to-event data. These benefits make it attractive as the first model to attempt on new data. However, the survival curve is usually not smooth and a median survival time cannot be calculated if more than 50% of the data is censored. Lastly, if we want to analyze how covariates affect survival functions, we need other models. We will learn more about this in later videos.

12. Let's practice!

Let's practice!