Get startedGet started for free

Model extension part 1: Dummy variables

1. Model extensions part 1: Dummy variables

Welcome back! In this lesson you learn how to use dummy variables for capturing the effects of marketing activities.

2. Dummy variables

Dummy variables are indicator variables that usually take on the value zero or one to indicate the absence or presence of an effect that may be expected to shift the market outcome. For example, suppose the brewery installed a point-of-purchase display. Point-of-purchase displays are product displays that are meant to produce impulse purchases when viewed. For the installation of such displays a DISPLAY dummy variable can be constructed taking on the value one, when beer is on display, and zero otherwise.

3. Understanding dummy variables

Conceptually, in R, dummy variables are factors having two distinct category levels. In our example, the category levels - zero and one - indicate the absence or presence of an DISPLAY effect - that might be expected to shift the log-SALES outcome. Therefore, it is useful to start with examining log-SALES separately for the category levels in DISPLAY. For that purpose the aggregate-function is quite useful. The aggregate function groups the data according to the levels in DISPLAY by the specified formula argument. In a second step some descriptive functions are applied to each category level. The output of the aggregate function shows that the average log-SALES are a little higher for those weeks Hoppiness was on display.

4. The effect of display on sales

To explain the effect of display activity on the beer unit sales, the DISPLAY dummy is added as a predictor variable to the sales response function. The estimated coefficients describe the effect of a unit change in DISPLAY on log-SALES. Here the intercept are the average log-SALES for weeks without any display activity and the slope is the difference between the average log-SALES for weeks without and weeks with display activity - which means: the slope describes the switch from no-DISPLAY to DISPLAY. Without DISPLAY activity, log-SALES correspond to 4.19, which correspond to around 66 sales units on the exponential scale - as we are still operating on the log-SALES. If beer is on DISPLAY the log-SALES increase by “approximately” 46-percent. The actual difference between DISPLAY and no-DISPLAY is a 60 percent shift in unit sales.

5. The effect of multiple dummies on sales (1)

Usually, point-of-sales displays are not the only marketing actions taken by a company. Within retail, a common strategy to increase product sales is to distribute manufacturer coupons - offering a financial discount when purchasing their product. Often, coupon and display actions are combined and run together at the same time. We calculate the average log-SALES for the DISPLAY, COUPON, and DISPLAY and COUPON combination by successively adding the variables to the formula argument in aggregate. Again, we see that the average log-SALES are lowest for weeks without any promotion activity, and are highest for the combination of DISPLAY and COUPON.

6. The effect of multiple dummies on sales (2)

We can explain the effects of all marketing activities in a single model by defining an additive relationship for the effects of DISPLAY, COUPON, and DISPLAY plus COUPON in the formula argument of the linear model function. We store the result in an object named dummy-dot-model. The estimated coefficients show that, compared to weeks without promotion activity, the combination of DISPLAY and COUPON activity has the largest effect.

7. What about price?

When explaining the effects of marketing activities on sales, we should not forget about the effect of changes in PRICE. We use the function update(), to update the previously estimated dummy.model for the missing PRICE predictor. The coefficients of the re-estimated model show that the effect of PRICE is minor compared to the effects of the DISPLAY and COUPON activities.

8. Let's practice!

Great! Now, let’s pimp our model.