1. Model selection
In this lesson we extend the logistic response model to explain the effect of marketing activities on customer purchase decisions.
2. Extending the logistic response model (1)
This time, we are interested in the effects of point-of-purchase displays and product featuring on the probability to purchase HOPPINESS. Similar to product displays, features are shelf talkers touting product benefits to impulse product purchases.
We start with summarizing the dummy variables DISPLAY, FEATURE, and FEATURE combined with DISPLAY for HOPPINESS by using the function summary(). We see that these marketing activities are only present for less than 5 percent of the observations.
3. Extending the logistic response model (2)
Next, the logistic response model is extended by defining an additive relationship for the price-dot-ratio and the marketing activity predictors in the formula argument of the generalized linear model function. We store the result in an object named extended-dot-model and obtain the average marginal effects by using the function margins().
For a model with multiple predictors, the average marginal effect of a single coefficient is interpreted as the effect of a unit change when all other effects are held constant.
It turns out that the presence of FEATURE and DISPLAY has the largest effect and increases the purchase probabilities for HOPPINESS by 10 percent.
4. Summarizing the model
For evaluating the extended-dot-model’s goodness of fit, and the relevance of the predictors, we again use the function summary() to compactly display all the information required.
Much like in chapter two, based on the columns including the t-test statistics and the corresponding P-values, we would decide to exclude the DISPLAY predictor from the model as it's P-value is much larger than the commonly used critical value of 0-point-05.
Unlike before, we can no longer rely on the well-defined R-squared measures to judge the goodness of fit of our model. Instead, R reports two forms of deviance - the null deviance and the residual deviance. The null deviance shows how well the response variable is predicted by a model that includes only the intercept. The residual deviance measures the prediction accuracy of a model having additional predictors.
For the intercept-only model, we have a value of 1820. The Inclusion of the price ratio and the marketing activity predictors decreases the deviance to 1275-point-8. These values have little intuitive meaning and we need a standard to help us evaluate the size of the deviance of the extended model relative to the deviance of the null model.
5. The deviance principle
Therefore, we fit an intercept-only model and assign the result to an object called null-dot-model. The deviance of the extended-dot-model is compared against the deviance of the null-dot-model by using the function anova(). The reduction in deviance is tested by the likelihood-ratio-test, using the additional argument test="Chisq".
For the null-dot-model, we obtain again a value of 1820. Including additional marketing predictors into the model decreases the deviance to 1275-dot-8, which is a significant reduction, meaning: the extended-dot-model does fit the data better than a model without any additional predictors.
6. Eliminating predictors
The general principles of variable selection we learned about in chapter two also apply for logistic response models as the AIC is based on the Deviance.
Again we use the function stepAIC() from the add-on package MASS to perform backward elimination of unnecessary predictors. The model having the smallest AIC, which is returned by the summary function, is the model including the price-dot-ratio and the FEATURE and FEATURE plus DISPLAY predictors. The DISPLAY predictor has been excluded as it brings only little additional explanatory power.
7. Let's practice!
Let’s go an crush some models in practice.