Get startedGet started for free

Deleting missing data

You saw before that the interest rate (int_rate) in the data set loan_data depends on the customer. Unfortunately some observations are missing interest rates. You now need to identify how many interest rates are missing and then delete them.

In this exercise you will use the function which() to create an index of rows that contain an NA. You will then use this index to delete rows with NAs.

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

  • Take a look at the number of missing inputs for the variable int_rate using summary().
  • Use which() and is.na() to create an index of the observations without a recorded interest rate. Store the result in the object na_index.
  • Create a new data set called loan_data_delrow_na, which does not contain the observations with missing interest rates.
  • Recall that we made a copy of loan_data called loan_data_delcol_na. Instead of deleting the observations with missing interest rates, delete the entire int_rate column by setting it equal to NULL.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Look at summary of loan_data


# Get indices of missing interest rates: na_index
na_index <- 

# Remove observations with missing interest rates: loan_data_delrow_na
___ <- loan_data[-___, ]

# Make copy of loan_data
loan_data_delcol_na <- loan_data

# Delete interest rate column from loan_data_delcol_na
Edit and Run Code