Inspecting choice data

1. Inspecting choice data

One thing that trips up new choice modelers is that choice data doesn't fit into the usual format we use for predictive modeling. So, let's take a look at how choice data is structured.

2. Data for linear regression

We usually organize data in rows where each row represents one observation. For the data here, each row is an observation of sales at a store and we have some information about the characteristics of each store. In this data, the number of rows is the number of observations.

3. Data for a choice model

In a typical choice dataset, we observe someone making a choice from a set of options that have common features. It's convenient to stack up the options for one choice observation into multiple rows where each row describes one of the alternatives. For instance, the first three rows of this data describe a choice from among three different sports cars. The first car was a 2-seater with a manual transmission for thirty-five thousand dollars. The second two options were both automatic 5-seaters with one at 40 thousand and the other at 30 thousand. To keep track of which rows belong to which observations, we have columns called ques - short for question - and alt - short for alternative. The first three values of ques are all 1s indicating that these three rows all belong to question 1. The values of alt are 1, 2 and 3 indicating that these are the three alternatives the customer chose from. The choice is recorded in the column labeled choice as a 0 or 1 for each option and, of course, only one option was chosen for each observed choice. In question 1, the third option was chosen. The second three rows describe another choice. It also has three alternatives, but that doesn't have to be the case - some of the observed choices may have four or five or more options. The important thing to realize is that that there is a row in the data frame for each alternative that was available and a set of rows make up one observed choice.

4. Summarizing choice data with choice counts

Our ultimate goal is to fit a multinomial logit model to choice data, but before we do, we should do some descriptives so we get a feel for what's going on in the data. With choice data, what we really want to know is what people are choosing. One way to get a sense for this is to count up the number of times a 30 thousand dollar sports car is chosen in the data and compare that to the number of times a 35 or 40 thousand dollar sports car is chosen. We can do this using the function xtabs(), which, as you can see here, takes two inputs: a formula and a data frame. In the code here the formula says "sum up the choice variable separately for each level of price". Because choice is a 0 or 1 indicating whether that alternative is chosen, the output is a count of the number of times a sports car was chosen at 30, 35, and 40 thousand dollars. From the output, we can see that this data includes 1,010 choices where the chosen car was priced at 30 thousand dollars and only 324 choices of cars priced at 40 thousand dollars. Not much of a surprise there - people like cheaper cars! By the way, you could do this same calculation using the dplyr package, if you prefer, but I find the formula input for xtabs() convenient.

5. Let's look at some choice data in R!

So that you can get a feel for this, let's take a look at the sports car data in R.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Choice Modeling for Marketing in R

AdvancedSkill Level

4.8+

15 reviews

Our goal for this chapter is to get you through the entire choice modeling process as quickly as possible, so that you get a broad understanding of what we can do with choice models and how the choice modeling process works. The main idea here is that we can use a choice model to understand how customers' product choices depend on the features of those products. Do sportscar buyers prefer manual transmissions to automatic? By how much? In order to give you an overview, we will skip over many of the details. In later chapters, we will go back and cover important issues in preparing data, specifying and interpreting models and reporting your findings, so that you are fully prepared to use these methods with your own choice data.

Exercise 1: Why choice?Exercise 2: Choice data Exercise 3: Inspecting choice data

Current Exercise

Exercise 4: Finding the levels of a factor Exercise 5: Inspecting a choice observation Exercise 6: What did people choose?Exercise 7: Fitting and interpreting a choice model Exercise 8: Fitting a choice model Exercise 9: Interpreting parameters Exercise 10: Using choice models to make decisions Exercise 11: Predicting choice shares Exercise 12: Plotting choice shares