Converting from wide to long
1. Converting from wide to long
Whenever I get a data set in wide format, I like to convert it to long format before I start my analysis.2. Why long format?
I prefer long format data because it makes it easier to do summaries and visualizations on the data before we start modeling. Second, it works better for storing choices with different numbers of alternatives. If question 1 had another alternative in it, I could just add another row for the fourth alternative.3. Sportscar data in wide format
Here we have the top of a data frame called sportscar_wide, which contains the sportscar data in a wide format. The first row describes the first question asked to respondent 1. We can see from the choice column that the respondent picked alternative 3 in the first choice. Then there are three columns for each attribute to describe the features of each of the three alternatives in each choice. We want to get the sportscar_wide data back to long format so that we can analyze it.4. Transforming from wide to long
We can transform data from wide to long format using the reshape() function. It has a lot of inputs, so let's go through them carefully. The first input to reshape() is the name of the data frame that we want to transform. Here it is sportscar_wide. Next, we tell reshape() that we want to go from wide to long format with the direction="long" input. The next input called varying is the most important. It tells reshape() which columns contain each of the attributes we want to stack up. Columns 5 to 7 contain the seat attributes for the three alternatives. Columns 8 to 10 contain the trans attributes. Columns 11 to 13 contain the convert attribute. Columns 14 to 16 contain the price attribute. You might have to go back to the previous slide to check that I got those right. The last line of inputs tells reshape() how to label the columns in our new long data frame. For the columns that we are stacking up, we want the names to be seat, trans, convert and price and we tell that to reshape() using the v-dot-names input. Finally, timevar equals "alt" tells reshape() to include a column numbering the alternatives and call it alt. And that's it. With this one command, we can get our data into long format. When we use head(sportscar) to see the top of our new data frame, it looks a lot like the long format sportscar data frame that we saw in Chapter 1. But there are two last things we need to do to tidy up the long data. First, it would be good to sort sportscar, so that the alternatives for the same question are in successive rows.5. Sorting the long data
We can sort the data by creating a new_order for the rows using the order() function. Then when we reorder the rows in the data frame by passing new_order into the square brackets, all the alternatives belonging to the same question are together. There is one last thing to fix. Notice that the choice is stored as an integer. Usually, in long data, the choice column contains TRUE and FALSE or 0 and 1 to indicate whether the alternative on that row was chosen. We can easily convert the choice variable using a little logic that I'll show you on the next slide.6. Converting choice to a logical
I use the logic sportscar$choice equals equals sportscar$alt to create a TRUE for the rows that contain the chosen alternative and a FALSE in the other rows. After I reassign that back to the choice column our data is ready to go.7. Let's practice!
Now let's try all of that with the chocolate data.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.