1. The Weibull model for estimating smooth survival curves
In this video, we will discuss the Weibull model as an alternative to the Kaplan-Meier estimate.
2. Why use a Weibull model?
Although the Kaplan-Meier estimate is quite powerful and used by many people, it is used mostly as a descriptive tool for looking at the data. Also, it approximates the survival curve with a step function. See those little steps in the graph here? You can view the Kaplan-Meier curve similar to a histogram. It is a good way for looking at the data.
The Weibull model produces a survival curve that is not a step function but it is smooth. Using a Weibull model instead of a Kaplan-Meier curve is like using a linear model instead of using a histogram. You need to put in some assumptions about the distribution, but it is helpful for more complex analyses, such as having the ability to adjust for covariates and making inferences.
The distribution we assume here is - as the name says - the Weibull distribution. It seems that this assumption works quite well in the example you see here. There are, of course, other options for distributions, but we will focus on the Weibull distribution here because it works well for many problems.
We will not go into the details on how to do inference in this course, but we will look into adjusting for covariates in the next chapter.
3. Computing a Weibull model in R
Computing a Weibull model in R can be done with the survreg function from the survival package. Looks quite easy, right?
4. Computing a Weibull model in R
In fact, it looks quite similar to using survfit to estimate a Kaplan-Meier estimate. The only difference is the reg (which stands for REGression) in survreg instead of the fit in survfit.
5. Computing measures from a Weibull model
Of course, we also want to compute some measures from this model. To compute the time point which 90 Percent of patients survive, we can use the predict function. Setting the type argument to quantile allows us to compute the quantiles of the distribution function. As we know, the distribution function is 1 - the survival function and so we need to enter 1 - 0-point-9 (for 90%) in the argument for the quantile "p". The newdata argument allows us to enter a specific patient. This is only really relevant if we discuss the case where there are covariates in the model, which we will discuss in the next chapter. For now, you can just enter what is shown here. To conclude: 90% of patients survive more than 384 days.
6. Computing the survival curve from a Weibull model
Using the predict function we can also then compute the survival function. We just need to enter many quantiles. This approximation is good enough for plotting. So here we create a grid from .99 to .01 in steps of .01. The usage of the function is the same as before, just that we enter a vector of quantiles instead of a single value. From this, we can then create a data frame that we can use for plotting. How to do that exactly will be the topic of the next video.
7. Let's practice!
Now it's your turn.