Fitting the Cox Proportional Hazards model

1. Fitting the Cox Proportional Hazards model

Another method for survival regression is the Cox Proportional Hazards model.

2. Hazard function and hazard rate

The hazard function describes the probability that the event of interest happens at some time, given the individuals survive up to this time. Hazard rates along this function are the instantaneous rates of the event occurring. You may remember seeing the hazard curve after fitting the Weibull model. The hazard function and the survival function can be derived from each other.

3. The proportional hazards assumption

The proportional hazards assumption means all individuals have hazard rates proportional to one another. In other words, individual A's hazard function is individual B's hazard function times some constant. For example, these two survival curves satisfy the proportional hazards assumption. This assumption has two major implications. One, there is a baseline hazard function and other hazard functions are specified with a scaling factor. Two, the relative survival impact from a variable does not change with time.

4. The Cox Proportional Hazards model

The proportional hazards assumption underlies the Cox PH model, which has two parts. The left side is a population-level base hazard function that changes with time. The right side describes the linear relationship between covariates and the log of hazard. Notice that time t does not appear on the right side of the equation. The Cox PH model is another way to run survival regression with covariates and quantifies the influence of factors on survival.

5. Data requirement for Cox PH model

Data-wise, we need a durations column, an event column, and covariates columns we want to regress with. These may be continuous variables or categorical variables that are one-hot encoded. If there aren't event columns, the model will assume that no individuals are censored.

6. Fitting the Cox PH model

To fit the Cox PH model with lifelines, we import the CoxPHFitter class from the library and instantiate it. Then, the dot-fit method needs at least two parameters: df, a DataFrame of the survival data and duration_col, the name of the duration column as a string. Optionally, specify the event_col with the name of the censorship column. Calling dot-fit fits the model to the DataFrame and adds new properties such as dot-summary and dot-predict.

7. Example Cox PH model

Mortgage_df describes the time to full mortgage payoff. All columns except the duration and paid_off columns are covariates. We import the CoxPHFitter class. Then we instantiate the CoxPHFitter class from lifelines and call dot-fit with the name of the DataFrame, which is mortgage_df, the name of the durations column which is duration, and the name of the censorship column which is paid_off.

8. Custom model

When calling dot-fit, all columns except the durations and event column are used as covariates. Sometimes we may want to specify which covariates to use. The first way is removing the columns we don't want by creating a new DataFrame using the dot-loc method to filter columns. The second way is using the formula parameter when calling dot-fit and passing the covariates we want to use as a string. The second method is more convenient but it doesn't scale to a large number of covariates.

9. Interpret coefficients

Printing the model summary by calling dot-summary, the column exp coef is E to the power of a coefficient. It is called the hazard ratio and indicates how much the baseline hazard changes with a one-unit change in the covariate. For example, interest has a coefficient of 0-point-31, meaning that if interest increases by 1, the hazard changes by a factor of e to the power of 0-point-31, which is 1-point-37. This means a 37% increase in hazard.

10. Let's practice!

Let's practice!