ComeçarComece de graça

Looking at data

The dataset salesData is loaded in the workspace. It contains information on customers for the months one to three. Only the sales of month four are included. The following table gives a description of some of the variables whose meaning is less obvious.

Variable Description
id identification number of customer
mostFreqStore store person bought mostly from
mostFreqCat category person purchased mostly
nCats number of different categories
preferredBrand brand person purchased mostly
nBrands number of different brands

The packages readr, dplyr, corrplot, and ggplot2 have been installed and loaded.

Este exercício faz parte do curso

Machine Learning for Marketing Analytics in R

Ver curso

Instruções do exercício

  • Use the structure command str() in order to get an overview over the data.
  • Now visualize the correlation of the continuous explanatory variables for the past three months with the sales variable of this month. Use the functions cor() and corrplot() and the pipe operator. Note that the right variables have already been selected for you.
  • Additionally, make a boxplot displaying the distribution of the salesThisMon dependent on the levels of the categorical variable preferredBrand. The same has already been done for the categorical dependent variable mostFreqStore.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Structure of dataset
str(___, give.attr = FALSE)

# Visualization of correlations
salesData %>% select_if(is.numeric) %>%
  select(-id) %>%
  ___
  ___

# Frequent stores
ggplot(salesData) +
    geom_boxplot(aes(x = mostFreqStore, y = salesThisMon))

# Preferred brand
ggplot(___) +
    geom_boxplot(aes(x = ___, y = ___))
Editar e executar o código