Get startedGet started for free

Visualizing choice data

1. Visualizing choice data

We've done a lot of work preparing our data. We took in wide format and converted it to long. We took data in two files and merged it into one. All of these are techniques you might need to get your data into long format. Once your data is in long format, we can easily summarize it to see what people chose.

2. Summarizing with xtabs()

The function xtabs() produces counts of the number of times each level of a factor variable occurs in a data frame. It uses a syntax that is very similar to lm(), which you have probably seen before. The first input is a formula and the second is the name of a data frame. Here we have tilde-trans as the formula, which tells xtabs() to count up how many times each level of trans appears in sportscar. You can see from the output that there were three-thousand-one alternatives with automatic transmission and two-thousand-nine-hundred-ninety-nine with manual transmission. That just tells us something about how often these alternatives appear in the choices. It would be more helpful to look at how often each type of transmission is chosen. To do that, we can change the formula to tilde-trans-plus-choice. The output of xtabs() then gives us the number of times that the automatic and manual transmissions appeared in a question and were chosen or not chosen. It looks like sportscars with automatic transmissions were chosen 1328 times, which is much more often than the manual transmission at 672. Finally, if we use the formula choice ~ trans, xtabs() will sum up the choice variable for each level of trans, which tells us how many times auto and manual were chosen.

3. Plotting the output of xtabs()

The output of the xtabs() function is an xtabs object. If we pass an xtabs object into the plot() function, plot() knows exactly what to do. The result is a useful mosaic plot. In this mosaic plot, the width of the bars are proportional to the number of times automatic and manual transmissions appear in the sportscar data. As you can see, the width of the bars is about equal, which means that automatic and manual transmission appear among the alternatives about the same number of times. This is typical for survey data but may not happen in data from real markets. The more useful thing to look at is the height of the bars which is proportional to the number of times each alternative is chosen or not chosen relative to how often it appears. Each question asked the respondent to choose one of three alternatives, so the total area for choice-equals-1 should be one third. By comparing the number of automatics that are chosen to the number of manuals, we can see that automatics are chosen about twice as often as manual. So, we have a quick visual that tells us how often people are choosing manual and automatic transmissions. It's a good idea to look at choice counts for every attribute in the data.

4. Transmission choice by segment

We can also bring a second variable into our plot. Here I've added the segment variable, which describes which consumer segment the respondent belongs to. There are three segments of respondents: those looking for a basic sportscar, those who are focused on a fun car, and racers who want to use their sportscar to race (legally, on a racetrack, of course). At the bottom of the plot can see that the racers are more likely to choose a manual transmission and less likely to choose an automatic versus the other two segments.

5. Let's practice!

Now let's create some plots to see what types of chocolate people choose.