Analyzing MaxDiff Using Standard Logit Models Using R
Contents
The required setup of the data
Using the example described in detail in Counts Analysis of Max-Diff Data, the setup of the data for analysis using the mlogit library is shown below. Note that:
- Each of the six blocks is represented by two separate sets, each containing three rows.
- The first set consisted of three alternatives: C, D and E, and these are represented as three rows with a 1 appearing in the columns corresponding to each of these rows (i.e., a Dummy Variable).
- A 1 appears in the Choice column in the row corresponding to alternative C and this indicates that C was chosen as Best.
- The second set appears in the next three lines and -1 is shown instead of 1 for each of the three brands that are chosen.
- A 1 appears in the Choice column in the row corresponding to alternative E (the 6th row of data) and this indicates that E was chosen as Worst.
ID Block Set Choice A B C D E F 1 1 1 1 0 0 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 1 0 1 1 2 0 0 0 -1 0 0 0 1 1 2 0 0 0 0 -1 0 0 1 1 2 1 0 0 0 0 -1 0 1 2 3 1 1 0 0 0 0 0 1 2 3 0 0 0 0 1 0 0 1 2 3 0 0 0 0 0 1 0 1 2 4 0 -1 0 0 0 0 0 1 2 4 0 0 0 0 -1 0 0 1 2 4 1 0 0 0 0 -1 0 1 3 5 1 0 1 0 0 0 0 1 3 5 0 0 0 0 1 0 0 1 3 5 0 0 0 0 0 0 1 1 3 6 0 0 -1 0 0 0 0 1 3 6 0 0 0 0 -1 0 0 1 3 6 1 0 0 0 0 0 -1 1 4 7 0 1 0 0 0 0 0 1 4 7 1 0 0 1 0 0 0 1 4 7 0 0 0 0 0 0 1 1 4 8 0 -1 0 0 0 0 0 1 4 8 0 0 0 -1 0 0 0 1 4 8 1 0 0 0 0 0 -1 1 5 9 0 1 0 0 0 0 0 1 5 9 0 0 1 0 0 0 0 1 5 9 1 0 0 1 0 0 0 1 5 10 0 -1 0 0 0 0 0 1 5 10 1 0 -1 0 0 0 0 1 5 10 0 0 0 -1 0 0 0 1 6 11 1 0 1 0 0 0 0 1 6 11 0 0 0 0 0 1 0 1 6 11 0 0 0 0 0 0 1 1 6 12 0 0 -1 0 0 0 0 1 6 12 0 0 0 0 0 -1 0 1 6 12 1 0 0 0 0 0 -1
Reading the data into R
This code reads in the example data shown above:
trickedData = read.csv('http://surveyanalysis.org/images/8/82/TrickedLogitMaxDiffExample.csv')
Estimating the model
If modifying this code for a different data set (e.g., see Max-Diff Analysis Case Study Using R), note that:
- nAltsPerSet= 3 specifies the number of alternatives shown in each block. If the number of alternatives varies by set the code will need to be further modified (refer to the mlogit documentation for more information).
- Choice is the name of the Outcome Variable.
- The letters in the formula of B + C + D + E + F refer to the names of the variables which represent the alternatives. Note that the name of the first alternative is not used (this is required for identification).
library(mlogit) nAltsPerSet = 3 mlogit(Choice ~ B + C + D + E + F | 0, data = trickedData, alt.levels = paste(1:nAltsPerSet), shape = "long")
Interpreting the outputs
The above should return the following output (you may get slightly different numbers):
Call: mlogit(formula = Choice ~ B + C + D + E + F | 0, data = trickedData, alt.levels = paste(1:nAlternatives), shape = "long", method = "nr", print.level = 0) Coefficients: B C D E F -16.572 17.715 -32.902 -50.011 -66.934
This model estimates all the parameter values relative to Alternative A, where A has a parameter of 0. Thus, the parameter values are:
A B C D E F 0 -17 18 -33 -50 -67
The most straightforward interpretation of these values is that they indicate the rank ordering of the preferences. That, this data is consistent with a rank ordering of C > A > B > D > E > F. Note that this is a clear improvement over the counts analysis, which was unable to discern that A > B (see Counts Analysis of Max-Diff Data).
The differences between the scale values also has a meaning. For example, the probability that the respondent would have chosen alternative C if presented with a choice set of all six alternatives is computed using the inverse logit transformation, where is the parameter for the th alternative):
- ,
Comparing between respondents
Often it is useful to compare the max-diff results of different respondents (e.g., if conducting a segmentation study). When conducting such a comparison it is important to keep in mind that it is not meaningful to use the estimated parameters in a standard statistical analyses (e.g., it is not approriate to compute an average parameter score across a sample or to use the parameter estimates in multivariate techniques such as Cluster Analysis and Principal Components Analysis). This is because the parameter valus contain two bits of information: they contain information about the preferred ordering of the alternatives and they also contain information about the degree of noise in the data, and it is this "noise" component which makes the comparison between respondents difficult (i.e., the noise causes a difference in the scale factor).
The following example swaps the Best and Worst choices in one of the choice sets and re-estimates the model:
trickedData[13,4] = 0 trickedData[15,4] = 1 trickedData[16,4] = 1 trickedData[18,4] = 0 mlogit(Choice ~ B + C + D + E + F| 0, data = trickedData, alt.levels = paste(1:nAltsPerSet), shape = "long")
In the resulting output, shown below, note that:
- The overall range of values has changed dramatically. Previously the values ranged from -67 to 17, whereas now they are from -18 to 17.
- The parameter for A is still assumed to be 0 and thus the parameters for A, B and C have not been changed much, but the parameters for D, E and F are now all very similar to the value for B. Thus, even though the parameter for B has not changed much, its meaning (i.e., its value relative to D, E and F) has changed markedly. Using the formula described earlier, the first model suggests that if choosing between B, D, E and F, there is a 100% chance somebody would choose B, whereas with the new parameters this drops to 21%.
Coefficients: B C D E F -17.577 16.852 -16.784 -18.369 -17.577
If we swap around another Best and Worst choice:
trickedData[1,4] = 0 trickedData[2,4] = 1 trickedData[6,4] = 0 trickedData[5,4] = 1 mlogit(Choice ~ B + C + D + E + F| 0, data = trickedData, alt.levels = paste(1:nAltsPerSet), shape = "long")
we get a new set of values with a substantially reduced range of values (-2 to 1, compared to our original -67 to 17). That the range of values is much smaller is informative: it tells us that this data is much less clear in terms of its preferences. However, the important take-out of this is that any standard statistical analysis of the parameters will be greatly influenced by the respondents with less noise in their data.
Coefficients: B C D E F -2.04252 0.76729 -1.26024 -1.90273 -2.04252
Ways of making the parametes more comparable are to:
- Focus only on the rank ordering of the parameters (see Max-Diff Analysis Case Study Using R).
- Compute the probability of each alternative being chosen first from amongst all the alternatives, using the formula described above (also, see Max-Diff Analysis Case Study Using R).
- Use a mixture model (see Max-Diff Analysis Case Study Using R).
- Scale each respondent's parameters such that the minimum value is 0 and the maximum value is 1. This is not strictly valid, as it assumes that ordinal data has interval-scale properties, but it is certainly better than not scaling the data.
Comments
- It is common-place to trick software to run different types of models from that which are intended (e.g., using linear regression to estimate ANOVA). However, the tricking that occurs in this example is different to this and not without risk, as:
- The model that is estimated violates the i.i.d. assumption of the Multinomial Logit Model (i.e., as the Best and Worst choices are not independent).
- The idea of reversing the coding for the Worst choice is not consistent with Random Utility Theory.
- The violation of the i.i.d. Assumption means that statistical tests reported from software that is tricked will be biased.
- The most promiment purveyor of max-diff software, Sawtooth Software, use this 'tricking' method in all their max-diff software products and no adverse consequences have been reported in the decade or so that they have been using this method.