Why learn survival methods?

1. Why do we need special methods for time-to-event data?

In this lesson, we will discuss why we need special methods for survival analysis. Why can't I just compute a linear model?

2. Why survival analysis

The first important point of why we need particular methods for survival analysis is the fact that duration times are always positive. So we need to work with distributions that can handle positive outcomes. The linear model, for example, assumes a normal distribution, which is not very appropriate for positive outcomes. A common distribution to model duration times is the Weibull distribution and the corresponding model is called the Weibull model, which we will discuss later in this course. Historically in survival analysis, the survivor function has been a measure of interest. We will learn about the survivor function in the next lesson. There are some other measures that are of more interest in survival analysis than in other areas, like the hazard function. The last point why we need special methods for survival analysis is probably the most important: censoring.

3. Why survival analysis

In the example shown here, we know that for individual 1 the event happened at time point 5. Of individual 2 we only know that the event did not happen until time point 6, but we have no knowledge about what happened after that.

4. Why survival analysis

Let's think about the cab example again. Each day you call a cab and want to analyze how long it takes them to arrive at your house. The cab on day 1 arrives at your house after 5 minutes. Cab 2 doesn't arrive until time point 6 and you get annoyed and decide to walk instead. That leads to you never observing when the cab actually arrives. On day 3 the cab does not arrive in the first two minutes, but then you fall asleep and never observe what happens. The cabs on days 4 and 5 arrive after 4 minutes. This type of censoring is called right censoring and the most common type of censoring in survival analysis. There exist two other types of censoring: left and interval censoring, which we will not cover in this course.

5. Creating Surv objects

When working with right censored time-to-event data we need to specify this appropriately in R. In our example, we have times 5, 6, 2, 4 and 4. The event indicator is 1 if the event happened and 0 otherwise. This means the two censored individuals 2 and 3 have a value of 0. With the R package survival, we can specify that the variables time and event belong together. We do this using a Surv()-object created with the Surv() function. We will also call this a survival-object. In one of the upcoming exercises, we will take a look at the Surv()-object more deeply and see what it looks like in the GBSG2 data set. But speaking of R packages, I haven't told you about the R packages we will be using in this course.

6. R packages

Aside from the packages which store the datasets, we will focus on two packages during this course. Most importantly we will use the survival package. It provides all functionality for basic survival analyses and is a very widely used R package. The survival package allows the user not only to do survival analysis but also to visualize the results. Additional to the plotting features in the survival package we will be using the survminer package for more advanced visualization. We will focus on interpreting visualizations in this course since we will skip the mathematically more advanced interpretation of the model effect estimates.

7. Let's practice!

Now let's try some examples.