Preprocess censored data
You are a marine-biologist studying the lifespan of spinner dolphins. You have access to historical data detailing their birth and death dates. Some tagged dolphins migrated to a different part of the water and the lab lost track of them. Some dolphins are migrants from a different pod, and their exact birth dates are unknown. Some dolphins are still alive!
- If the birth date is
NaN
, the dolphin is a migrant. - If the death date is
NaN
, the dolphin either ran away or is alive.
The DataFrame is called dolphin_df
. To create a new column called observed
to flag if a dolphin's lifetime is censored, fill out the function check_observed
with appropriate values and use .apply()
to apply the function to dolphin_df
.
pandas
and numpy
are loaded as pd
and np
, respectively.
This exercise is part of the course
Survival Analysis in Python
Exercise instructions
- Create a function
check_observed
to return0
if the data point is censored, and1
otherwise. - Create a censorship flag column called
observed
using the functioncheck_observed
. - Print the average value of the
observed
column in the console.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a function to return 1 if observed 0 otherwise
def check_observed(row):
if pd.isna(row['birth_date']):
flag = ____
elif pd.isna(row['death_date']):
flag = ____
else:
flag = ____
return ____
# Create a censorship flag column
dolphin_df[____] = dolphin_df.apply(____, axis=1)
# Print average of observed
print(np.average(____))