MaxDiff Data File Layouts

From Displayr
Jump to navigation Jump to search

There is no standard way of laying out the data from max-diff experiments. The following descriptions encapsulate some of the common ways. None of these terms are standard.

Example data

In this example, we assume that the data has been collected with the following experimental design, where there are 4 alternatives, 5 blocks (i.e., questions) and 3 alternatives in every block:

           block
alternative 1 2 3 4 5
          A 1 1 1 0 1
          B 1 1 0 1 1
          C 1 0 1 1 0
          D 0 1 1 1 1

Further, we assume that there are two respondents, whose choices were as follows:

Respondent 1:
Best:  A, A, A, B, A
Worst: C, D, D, D, D

Respondent 2:
Best:  B, B, A, B, B
Worst: C, D, C, C, D

Super-Compressed Layout

  • Each respondent is shown in a separate row.
  • Each set is represented by two variables, one indicating the best choice and another the worst. Typically these are ordered by set (i.e., the best and worst for the first set next to each other, followed by the best and worst for the second set, etc.), but sometimes all the best and all the worst are grouped together.
  • The choices recorded are relative to the set. That is, if a respondent chooses the third alternative that appeared in a particular set then a value of 3 is stored in the file, even if this alternative was the fourth of the alternatives in the study.
  • Where randomization has occured, additional variables will store the order of the blocks and/or alternatives.

Using the above example:

B1 W1 B2 W2 B3 W3 B4 W4 B5 W5
1  3  1  3  1  3  1  3  1  3
2  3  2  3  1  2  1  2  2  3

This format is the easiest to create. It is also the least useful for data analysis (as it has to be converted to one of the other formats) and the most prone to error, as:

  • It is impossible for the person analyzing the data to identify errors that occur in the implementation of the experimental design (e.g., respondents that were meant to be shown one set were shown another).
  • Any miscommunications between the data processing team and the end-user of the file regarding randomization will result in the data being incorrectly analyzed.

Compressed Layout

This layout is the same as the Super-Compressed Layout, except that the choices recorded are relative to the complete set. That is, if a respondent chooses the third alternative that appeared in a particular set but the alternative was the fourth of the alternatives in the study, then a value of 4 is stored in the file.

Using the above example:

B1 W1 B2 W2 B3 W3 B4 W4 B5 W5
1  3  1  4  1  4  2  4  1  4
2  3  2  4  1  3  2  3  2  4

Although marginally easier to analyze than the super-compressed format, this layout is generally undesirable due to the potential for miscommunications regarding randomization.

The only usable version of this format is with any randomization removed so that the variable set order matches the original design question order. Labels should also be used in the variables instead of just values. This non-randomized format is used for the analysis options under Anything > Advanced Analysis > MaxDiff.

Stacked Layout

  • Each set is shown in a separate row. Thus, if there are 10 respondents and 5 sets the file contains 50 rows.
  • Each alternative is represented by a separate column.
  • The values in each variable are coded as:
1: Best
0: Shown but not selected
-1: Worst
MISSING VALUE: Not shown

In general, this format is the easiest form for data analysis, both because many analyses can be performed without any further data manipulation, and also because it is less susceptible to errors than either of the compressed formats. A limitation of this layout is that it is a not a Flat Data File (and most other questions in a survey are typically stored in a flat data file).

Using the above example:

ID	Set	A	B	C	D
1	1	1	0	-1	.
1	2	1	0	.	-1
1	3	1	.	0	-1
1	4	.	1	0	-1
1	5	1	0	.	-1
2	1	0	1	-1	.
2	2	0	1	.	-1
2	3	1	.	-1	0
2	4	.	1	-1	0
2	5	0	1	.	-1

Wide Layout

The Wide Layout is the same as the Stacked Layout, except that each respondent's data is stored in a single row, with the first set of columns representing the first set, followed by the second set, and so on. Using the above example:

A1	B1	C1	D1	A2	B2	C2	D2	A3	B3	C3	D3	A4	B4	C4	D4	A5	B5	C5	D5
1	0	-1	.	1	0	.	-1	1	.	0	-1	.	1	0	-1	1	0	.	-1
0	1	-1	.	0	1	.	-1	1	.	-1	0	.	1	-1	0	0	1	.	-1

This is the format used when analyzing MaxDiff data as a Ranking question.

More exotic analyses in Q and Displayr use this layout.