Get startedGet started for free

Fitting and interpreting a choice model

1. Fitting and interpreting a choice model

Now that we've inspected the data, we are ready to fit a choice model. The process is very similar to fitting a regression model, so let's start with a quick refresher on that.

2. Fitting a linear model with lm()

To fit a linear regression model, we use the function lm(). When we type this command, we are telling R to fit a model to predict y as a function of x1, x2, and x3 using the data in the my_data data frame. If lm_data doesn't include columns named y, x1, x2, and x3 you will get an error. We usually take the output of lm() and assign it to a model object that we can use later. Here we are assigning it to my_model. Once we have the my_model object we can see a summary of the model by typing summary of my_model.

3. Fitting a choice model with mlogit()

The process for fitting a choice model is very similar to fitting a linear regression model, except that we use a different function called mlogit(). Multinomial logit models are somewhat specialized, so you can't estimate them with lm() or even with the glm() function that you may have used before. Instead, we use the mlogit() function from the mlogit package. Just as with lm(), there are two key inputs to mlogit(): a formula and the name of the data frame where the data is stored. The data input is pretty straightforward, but the data has to be choice data. That means it has to have a column that indicates which choice observation each alternative belongs to. Here that is the ques column. It also has to have a column of 0's and 1's indicating which option was chosen, and here that is labeled choice. The formula that we use should always begin with the name of the column that indicates the choice because we want to predict the choice. Then we type a tilde and after the tilde we list the names of the product features we want to use to predict the choice. Just like lm(), we also indicate which data frame we want to use to fit the model. Under the hood, the model that we fit with mlogit() is different than the linear model we fit with lm(). For right now, we are going to skip over the details of how they are different, but we'll come back to that in Chapter 3.

4. Summary of mlogit() model object

When we ask for a summary of the mlogit model object, we get output that looks a lot like what you would get from a regression. The most important part of the summary output is the table of coefficients. We will go into more detail on all of the output, but for now, let's focus on the column labeled Estimate. The numbers in this column represent the relative value that customers place on each feature. For example, the coefficient for feature3low is negative 1-point-29, which means that people prefer the high level of feature3 to the low level. Just like with linear regression, the stars all the way to the right-hand side indicate which features have a statistically significant effect on choice. We'll go into more detail on how to interpret these parameters in Chapter 3, but for now, just keep in mind that parameters that are more than 1 or less than -1 indicate a very strong preference for a feature. The closer the coefficient is to zero, the weaker the preference.

5. Let's find out how people value the features of sports cars.

Next, let's find out how people value the features of sports cars by fitting a choice model to the sports car data.