Looking at data
The dataset salesData
is loaded in the workspace. It contains information on customers for the months one to three. Only the sales of month four are included. The following table gives a description of some of the variables whose meaning is less obvious.
Variable | Description |
---|---|
id | identification number of customer |
mostFreqStore | store person bought mostly from |
mostFreqCat | category person purchased mostly |
nCats | number of different categories |
preferredBrand | brand person purchased mostly |
nBrands | number of different brands |
The packages readr
, dplyr
, corrplot
, and ggplot2
have been installed and loaded.
This exercise is part of the course
Machine Learning for Marketing Analytics in R
Exercise instructions
- Use the structure command
str()
in order to get an overview over the data. - Now visualize the correlation of the continuous explanatory variables for the past three months with the sales variable of this month. Use the functions
cor()
andcorrplot()
and the pipe operator. Note that the right variables have already been selected for you. - Additionally, make a boxplot displaying the distribution of the
salesThisMon
dependent on the levels of the categorical variablepreferredBrand
. The same has already been done for the categorical dependent variablemostFreqStore
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Structure of dataset
str(___, give.attr = FALSE)
# Visualization of correlations
salesData %>% select_if(is.numeric) %>%
select(-id) %>%
___
___
# Frequent stores
ggplot(salesData) +
geom_boxplot(aes(x = mostFreqStore, y = salesThisMon))
# Preferred brand
ggplot(___) +
geom_boxplot(aes(x = ___, y = ___))