Data discovery
For your coding exercises you will use the theory that you just saw and apply it to a new dataset. This dataset is about bank customers and will be used to predict if customers will default on their loan payments.
There are very helpful functions in R to get an overview of the dataset at hand. For now you will only look at summary()
and str()
.
Necessary packages are loaded and the dataset defaultData
is already present in your working environment.
This exercise is part of the course
Machine Learning for Marketing Analytics in R
Exercise instructions
- Use
summary()
andstr()
to look at your data. - Also make sure to get some more insights about the variable of interest
PaymentDefault
by plotting a bar chart of the two levels.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Summary of data
___(defaultData)
# Look at data structure
___(defaultData)
# Analyze the balancedness of dependent variable
ggplot(___,aes(x = ___)) +
geom_histogram(stat = "count")