Deleting missing data
You saw before that the interest rate (int_rate
) in the data set loan_data
depends on the customer. Unfortunately some observations are missing interest rates. You now need to identify how many interest rates are missing and then delete them.
In this exercise you will use the function which()
to create an index of rows that contain an NA. You will then use this index to delete rows with NAs.
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
- Take a look at the number of missing inputs for the variable
int_rate
using summary(). - Use
which()
andis.na()
to create an index of the observations without a recorded interest rate. Store the result in the objectna_index
. - Create a new data set called
loan_data_delrow_na
, which does not contain the observations with missing interest rates. - Recall that we made a copy of
loan_data
calledloan_data_delcol_na
. Instead of deleting the observations with missing interest rates, delete the entireint_rate
column by setting it equal toNULL
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Look at summary of loan_data
# Get indices of missing interest rates: na_index
na_index <-
# Remove observations with missing interest rates: loan_data_delrow_na
___ <- loan_data[-___, ]
# Make copy of loan_data
loan_data_delcol_na <- loan_data
# Delete interest rate column from loan_data_delcol_na