1. Heterogeneity in preference for other features
Now that we know there is a lot of heterogeneity in price sensitivity, we should think about heterogeneity in preferences for other attributes. We would expect that people have different preferences for the features of sports cars or the different types of chocolates and we can incorporate this into our choice models. But these other attributes are factors and when we fit a hierarchical model, it is important to think about how those factors are coded. So, before we get to a model with heterogeneity in all the parameters. I'd like to take a minute and talk about a different way to code factors that works better for hierarchical models.
2. A different way to code factors
In the last chapter, we used dummy coding. For example, the seat factor was represented by two variables: seat4 and seat5 where seat4 is one if the car has four seats and zero otherwise. In the table, the rows represent the original levels of the seat factor and the columns represent our dummy coded variables that will be included in the model.
With effects coding, we still have two variables, but the first one takes the value of one if the car has four seats, zero if the car has 5 seats, and -1 if the car has 2 seats, as you can see here. This coding may look a bit funny, but what it does is make the coefficients for seat4 and seat5 relative to the average of all the levels instead of relative to the zero level.
3. Changing the coding for a factor
Every R factor has a set of contrasts, which define the coding scheme that you want to use for each factor. R's default is dummy coding, but we can change that.
In this line of code, contr-dot-sum(levels(sportscar$seat)) I create an effects coding scheme for the seat factor. When we assign the output of contr-dot-sum() to contrasts(sportscar$seat), we are storing the effects coding scheme with the data frame. That way, when we use this data later, the modeling function will know how we want this factor coded. This works for most modeling functions in R including lm(), glm() and mlogit().
Unfortunately, contr-dot-sum() doesn't label the new variables very nicely, so with this second line of code, I'm relabeling to make it a bit easier for us to read the results later.
Once we run this code, we can see the new coding scheme by typing contrasts(sportscar$seat).
4. Making all the coefficients heterogeneous
If we want all the parameters in our model to be normally distributed across people, we need to change rpar to a vector with 5 "n"'s -- one for each coefficient in the model. Each element in my_rpar should have a name corresponding to one of the model coefficients. This is a bit tricky when you have factor predictors because new coefficient names get generated for the coded variables when mlogit() does the factor coding.
But we can fit a non-hierarchical model and then use that to get the names for the my_rpar vector. First I fit m3 which is non-hierarchical. Then I take the names of the coefficients from m3 and assign it to the names of my_rpar.
When I print my_rpar we can see that it has the names of the coefficients.
I can then pass my_rpar into mlogit() to fit the hierarchical model m8.
5. Hierarchical model parameters
If we plot the estimated parameters for seat4 and seat5, we get a density plot for each parameter. We can see seat4 parameters ranges from -0-point-8 to +0-point-4 across people. This model suggests that there are some people who like 4-seats and others who dislike 4 seats.
6. Coefficient for the base level
There is one other cool feature of effects coding. When you have dummy codes the value of the left-out level is zero. When we dummy coded the seat variable, the seat4 and the seat5 variables took the value of 0 for 2-seat sports cars.
With effects codes, 2-seat cars are assigned -1 for both the seat4 and seat5 variables. So, the value of two-seat sports cars this isn't zero - under effects coding it is the negative sum of the coefficients for the other levels. So we can compute the population average value of 2-seats and see that it is around -0-point-16, which means 2-seats are less preferred on average.
7. Let's try it with the `chocolate` data!
Let's try more random coefficients with the chocolate data!