Looking at data
The dataset salesData
is loaded in the workspace. It contains information on customers for the months one to three. Only the sales of month four are included. The following table gives a description of some of the variables whose meaning is less obvious.
Variable | Description |
---|---|
id | identification number of customer |
mostFreqStore | store person bought mostly from |
mostFreqCat | category person purchased mostly |
nCats | number of different categories |
preferredBrand | brand person purchased mostly |
nBrands | number of different brands |
The packages readr
, dplyr
, corrplot
, and ggplot2
have been installed and loaded.
Este exercício faz parte do curso
Machine Learning for Marketing Analytics in R
Instruções do exercício
- Use the structure command
str()
in order to get an overview over the data. - Now visualize the correlation of the continuous explanatory variables for the past three months with the sales variable of this month. Use the functions
cor()
andcorrplot()
and the pipe operator. Note that the right variables have already been selected for you. - Additionally, make a boxplot displaying the distribution of the
salesThisMon
dependent on the levels of the categorical variablepreferredBrand
. The same has already been done for the categorical dependent variablemostFreqStore
.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Structure of dataset
str(___, give.attr = FALSE)
# Visualization of correlations
salesData %>% select_if(is.numeric) %>%
select(-id) %>%
___
___
# Frequent stores
ggplot(salesData) +
geom_boxplot(aes(x = mostFreqStore, y = salesThisMon))
# Preferred brand
ggplot(___) +
geom_boxplot(aes(x = ___, y = ___))