Preprocess censored data
You are a marine-biologist studying the lifespan of spinner dolphins. You have access to historical data detailing their birth and death dates. Some tagged dolphins migrated to a different part of the water and the lab lost track of them. Some dolphins are migrants from a different pod, and their exact birth dates are unknown. Some dolphins are still alive!
- If the birth date is
NaN, the dolphin is a migrant. - If the death date is
NaN, the dolphin either ran away or is alive.
The DataFrame is called dolphin_df. To create a new column called observed to flag if a dolphin's lifetime is censored, fill out the function check_observed with appropriate values and use .apply() to apply the function to dolphin_df.
pandas and numpy are loaded as pd and np, respectively.
This exercise is part of the course
Survival Analysis in Python
Exercise instructions
- Create a function
check_observedto return0if the data point is censored, and1otherwise. - Create a censorship flag column called
observedusing the functioncheck_observed. - Print the average value of the
observedcolumn in the console.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a function to return 1 if observed 0 otherwise
def check_observed(row):
if pd.isna(row['birth_date']):
flag = ____
elif pd.isna(row['death_date']):
flag = ____
else:
flag = ____
return ____
# Create a censorship flag column
dolphin_df[____] = dolphin_df.apply(____, axis=1)
# Print average of observed
print(np.average(____))