Looking at data

The dataset salesData is loaded in the workspace. It contains information on customers for the months one to three. Only the sales of month four are included. The following table gives a description of some of the variables whose meaning is less obvious.

Variable	Description
id	identification number of customer
mostFreqStore	store person bought mostly from
mostFreqCat	category person purchased mostly
nCats	number of different categories
preferredBrand	brand person purchased mostly
nBrands	number of different brands

The packages readr, dplyr, corrplot, and ggplot2 have been installed and loaded.

This exercise is part of the course

Machine Learning for Marketing Analytics in R

View Course

Exercise instructions

Use the structure command str() in order to get an overview over the data.
Now visualize the correlation of the continuous explanatory variables for the past three months with the sales variable of this month. Use the functions cor() and corrplot() and the pipe operator. Note that the right variables have already been selected for you.
Additionally, make a boxplot displaying the distribution of the salesThisMon dependent on the levels of the categorical variable preferredBrand. The same has already been done for the categorical dependent variable mostFreqStore.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Structure of dataset
str(___, give.attr = FALSE)

# Visualization of correlations
salesData %>% select_if(is.numeric) %>%
  select(-id) %>%
  ___
  ___

# Frequent stores
ggplot(salesData) +
    geom_boxplot(aes(x = mostFreqStore, y = salesThisMon))

# Preferred brand
ggplot(___) +
    geom_boxplot(aes(x = ___, y = ___))

Edit and Run Code