1. What is survival analysis?
Hello, and welcome! My name is Shae Wang and I'm a Data Scientist. In this course, we will learn about survival analysis, its real-world applications, and how to build our own survival models.
2. What is survival analysis?
Many decades ago, doctors and statisticians developed survival analysis in clinical trials. They used it to measure the effects of medical treatments on patients' survival outcomes and life expectancy.
3. What is survival analysis?
Despite its name, survival analysis isn't limited to studies about survival. It's a branch of statistics focused on analyzing time to an event. Today, it is widely used in biostatistics and medical research, and it also has broad application across many industries.
For example, in engineering, it's used to study the duration of equipment functioning without failure. In credit finance, it's used to study the time until loan defaulting. The event of interest is not always negative. We can also use survival analysis to estimate the time until an App's free-trial users convert to subscribers or the time until fruit trees bear their first fruits.
4. What is survival analysis?
During this course, we will be referring to this branch of statistical methods as survival analysis. Keep in mind that an event does not equal death, survival means the event does not occur, and survival duration means the time until the event of interest occurs.
5. Time-to-event data
Survival analysis relies upon the observed rate of event occurrence to make inferences about the underlying survival pattern. Survival analysis is structured around using time-to-event data, typically consisting of a "duration" column and an "observed" column. The "duration" column specifies the survival duration, or time until the event. The "observed" column indicates if the event has been observed.
Time-to-event data requires the event of interest to be clearly defined and binary, so there is no ambiguity about whether it happened or not. For example, partial failures of various degrees across machines would be a poor event definition.
Survival analysis also benefits from having an abundance of data points to paint a fuller picture of the survival pattern.
6. The use cases of survival analysis
We can apply survival analysis to answer a family of questions about time to an event, such as what proportion of our subjects or population will experience the event? What is the time until the event in our population? At what rate will the event happen to our population? What factors about the event can we take into account? Who is more or less likely to experience the event? Given time-to-event data, survival analysis is a versatile tool for drawing inferences from the data and making predictions on new data.
7. Predict battery failure time
Let's see how this works in a simplified example. A trucking company frequently experiences worn out batteries during operations. If we estimated the remaining battery lifetimes, we could proactively replace batteries to improve operational efficiency. The owner asks us whether we should replace any batteries this month. So let's look at the battery time-to-event data.
8. Battery time-to-event data
Each row contains a battery ID, how long the battery has been in use, whether the battery has already died, and other characteristics. Notice first, each battery is a unique row and only "lives" once. This means each battery has no bearing on other batteries' survival. A battery failure event is unambiguous and binary. Batteries are never partially dead, and not all batteries have died. This is the type of time-to-event data that is perfect for survival analysis.
9. Let's practice!
Now we have covered the basics, let's check whether you have learned when survival analysis might be used and what data is best suited for it!