Visualizing your Kaplan-Meier model

1. Visualizing your Kaplan-Meier model

Let's deep dive into the survival curve of the Kaplan-Meier estimator.

2. How to construct a Kaplan-Meier survival curve?

Consider a dataset with 5 subjects. The duration column indicates the duration time and the observed column indicates censorship. To construct the survival function using its formula, first, we arrange the durations in increasing order. It's helpful to put censored data after uncensored data if they're tied. The second step is calculating d_i, n_i, and the percentage chance of survival at each time. Lastly, we multiply the percentage chances together to obtain the survival probabilities.

3. How to construct a Kaplan-Meier survival curve?

We will use the plus sign to denote censored data.

4. How to construct a Kaplan-Meier survival curve?

Each duration is arranged from small to large, and censored data comes after uncensored data.

5. How to construct a Kaplan-Meier survival curve?

Next, we calculate d_i and n_i.

6. How to construct a Kaplan-Meier survival curve?

d_i is the number of events at time i. One event is observed at each time, so d_i is 1 for all 3 times. If a subject is censored, it counts as 0 events.

7. How to construct a Kaplan-Meier survival curve?

n_i is the number of subjects that survived up to time i, meaning the subjects at risk at time i. Up to time=2, all 5 subjects survived. Up to time=3, we know that 3 subjects survived. Up to time=5, 2 subjects survived. If a subject is censored at a prior time, it is no longer at risk.

8. How to construct a Kaplan-Meier survival curve?

Given both d_i and n_i, we could calculate the percentage chance of survival at each time.

9. How to construct a Kaplan-Meier survival curve?

Lastly, we multiply each percentage chance of survival with those of prior times for survival probabilities.

10. How to construct a Kaplan-Meier survival curve?

Plotting the survival probabilities on the y-axis and time durations on the x-axis, we get a survival curve.

11. Interpreting the survival curve

This survival curve tells us the probability of survival at any part of the curve. A common misconception is that if the survival curve drops to zero, no subjects survived. In fact, as long as the last observation is not censored, the curve drops to zero.

12. Plotting the Kaplan-Meier survival curve

Real-life datasets are much larger. As we've seen in Chapter 1, there's no need to plot by hand! In addition to the KaplanMeierFitter class, we will also import the pyplot module. Plotting the Kaplan-Meier survival curve using lifelines requires you to instantiate a KaplanMeierFitter class and fit the data to the class. We then call dot-plot on the survival_function_ property of the KaplanMeierFitter instance and run plt-dot-show to display the figure.

13. The mortgage problem example

Let's practice on the mortgage problem. After importing the needed libraries and modules, we create a KaplanMeierFitter instance called mortgage_kmf and fit it with the duration and censorship columns. Survival_function_-dot-plot plots the function mortgage_kmf.

14. The mortgage problem example

Don't forget plt-dot-show.

15. Survival curve confidence interval

Often, we want to plot a survival curve with its confidence interval. dot-plot_survival_function is a method of the KaplanMeierFitter class that can show the 95% confidence interval by default.

16. Why is the confidence interval useful?

The confidence interval is useful to quantify uncertainty about each survival probability estimate. A wide confidence interval means the true value could be from a wider range and we are less certain about our point estimate. A narrower confidence interval means the opposite. The main factor that influences the width of the interval band is the sample size.

17. Ways to plot the Kaplan-Meier survival curve

Let's summarize all the ways we learned to plot the survival curve. To plot a continuous line, we could plot the survival function. We may also call dot-plot on the KaplanMeierFitter instance to plot a stepped curve. To disable the confidence interval, set the ci_show parameter to False.

18. Ways to plot the Kaplan-Meier survival curve

By default, dot-plot will plot the survival curve with its confidence interval. Another function called dot-plot_survival_function does the same thing as dot-plot.

19. Let's practice!

Let's practice!