Histograms
The data set loan_data
is loaded in your workspace. You previously explored categorical variables using the CrossTable()
function. Now you would like to explore continuous variables to identify potential outliers or unexpected data structures.
To do this, let's experiment with the function hist()
to understand the distribution of the number of loans for different customers.
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
- Use hist() to create a histogram with only one argument:
loan_data$loan_amnt
. Assign the result to a new object calledhist_1
. - Use
$breaks
along with the objecthist_1
to get more information on the histogram breaks. Knowing the location of the breaks is important because if they are poorly chosen, the histogram may be misleading. - Change the number of breaks in
hist_1
to 200 by specifying thebreaks
argument. Additionally, name the x-axis"Loan amount"
using thexlab
argument and title it"Histogram of the loan amount"
using themain
argument. Save the result tohist_2
. Why do the peaks occur where they occur?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create histogram of loan_amnt: hist_1
# Print locations of the breaks in hist_1
# Change number of breaks and add labels: hist_2
hist_2 <- hist(loan_data$loan_amnt, breaks = ___, xlab = "___",
main = "___")