Looking at data
The dataset salesData
is loaded in the workspace. It contains information on customers for the months one to three. Only the sales of month four are included. The following table gives a description of some of the variables whose meaning is less obvious.
Variable | Description |
---|---|
id | identification number of customer |
mostFreqStore | store person bought mostly from |
mostFreqCat | category person purchased mostly |
nCats | number of different categories |
preferredBrand | brand person purchased mostly |
nBrands | number of different brands |
The packages readr
, dplyr
, corrplot
, and ggplot2
have been installed and loaded.
Diese Übung ist Teil des Kurses
Machine Learning for Marketing Analytics in R
Anleitung zur Übung
- Use the structure command
str()
in order to get an overview over the data. - Now visualize the correlation of the continuous explanatory variables for the past three months with the sales variable of this month. Use the functions
cor()
andcorrplot()
and the pipe operator. Note that the right variables have already been selected for you. - Additionally, make a boxplot displaying the distribution of the
salesThisMon
dependent on the levels of the categorical variablepreferredBrand
. The same has already been done for the categorical dependent variablemostFreqStore
.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Structure of dataset
str(___, give.attr = FALSE)
# Visualization of correlations
salesData %>% select_if(is.numeric) %>%
select(-id) %>%
___
___
# Frequent stores
ggplot(salesData) +
geom_boxplot(aes(x = mostFreqStore, y = salesThisMon))
# Preferred brand
ggplot(___) +
geom_boxplot(aes(x = ___, y = ___))