Get startedGet started for free

Preprocess censored data

You are a marine-biologist studying the lifespan of spinner dolphins. You have access to historical data detailing their birth and death dates. Some tagged dolphins migrated to a different part of the water and the lab lost track of them. Some dolphins are migrants from a different pod, and their exact birth dates are unknown. Some dolphins are still alive!

  • If the birth date is NaN, the dolphin is a migrant.
  • If the death date is NaN, the dolphin either ran away or is alive.

The DataFrame is called dolphin_df. To create a new column called observed to flag if a dolphin's lifetime is censored, fill out the function check_observed with appropriate values and use .apply() to apply the function to dolphin_df.

pandas and numpy are loaded as pd and np, respectively.

This exercise is part of the course

Survival Analysis in Python

View Course

Exercise instructions

  • Create a function check_observed to return 0 if the data point is censored, and 1 otherwise.
  • Create a censorship flag column called observed using the function check_observed.
  • Print the average value of the observed column in the console.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a function to return 1 if observed 0 otherwise
def check_observed(row):
    if pd.isna(row['birth_date']):
        flag = ____
    elif pd.isna(row['death_date']):
        flag = ____
    else:
        flag = ____
    return ____
  
# Create a censorship flag column
dolphin_df[____] = dolphin_df.apply(____, axis=1)

# Print average of observed
print(np.average(____))
Edit and Run Code