Structure

From Displayr
(Redirected from Variable Set Structure)
Jump to navigation Jump to search

A key property of a variable set is the structure. The Variable Set Structure determines:

  • How Tables are created and manipulated.
  • How variables appear when used in R code.

The Variable Set Structure is automatically inferred when the data is imported. It can be modified by selecting a Variable Set in the Data Tree and either:

  • Combining or splitting variables into/from a Variable Set (Split or Combine).
  • Modifying it in the Object Inspector under Properties > GENERAL > Structure.

Variable Set Structures vary, non-exhaustively, on the following dimensions:

  • The properties of the variables: Text, Binary, Nominal, Ordinal, Numeric, Date.
  • The number of variables in the set: one or more than one.
  • Whether the variables within a set are organized in a two-dimensional structure (i.e., a grid) or not.
  • Whether the variables contain structural dependencies (i.e., where the meaning of the values in one of the variables is structurally related to the meanings of another of the variables).

Single Variable

Text

A single variable containing text (or, numeric data that is interpreted as text). For example, data obtained from a question like:

Please enter the name of the last soft drink you bought.

_____________

Nominal

A single variable that contains unordered, mutually exclusive, and exhaustive categories (i.e., has a nominal measurement scale). For example, data generated by the following question:

Are you...

o Male

o Female

Whereas a Text Variable Set stores the data as text, a Nominal Variable Set has both Value Attributes and Data Reductions.

Ordinal

A single variable that contains ordered, mutually exclusive, and exhaustive categories (i.e., has a ordinal measurement scale). For example, data generated by the following question:

How old are you?

o Under 30

o 30 to 50

o 50 or more

For most purposes, an Ordinal Variable Set is identical to a Nominal Variable Set. The only difference is that some statistical tests will take the ordering into account.

Numeric

A numeric variable (i.e., it has an interval or ratio measurement scale). For example, data that represents the temperature at a given point in time.

Date/Time

A numeric variable where the values represent times and/or dates. It contains the number of milliseconds since 1/1/1970.

JavaScript variables have special in-built functions for manipulating date questions (e.g., use Q.Year/Month/Day/Hour/Second() to extract bits of a date or time, and Q.YearDif/MonthDif/WeekDif/DayDif/HourDif/MinuteDif/SecondDif() to compare two of them).

Date/Time variables can be converted to different time scales (e.g., months, weeks, minutes) by clicking on the variable and pressing Date/Time in the Object Inspector.

Multiple variables

Text - Multi

Multiple related variables that contain text, e.g. generated from from a question like:

Please type in the names of your three favorite soft drinks

1.____________

2.____________

3.____________

Binary - Multi

There are only two non-missing values in each variable. Where the variable originally contains more than two categories, they are combined (see Value Attributes). This is the main way that non-mutually exclusive categories are represented in a Data Set (see also Compact Binary - Multi below). Common examples of Binary - Multi Variable Sets are lists of products purchased by people in a customer database, and responses to multiple response questions in surveys, such as:

Which of the following have you bought in the past week? Tick all that apply.

[] Coke

[] Pepsi

[] Fanta

[] None of these

Note that a row in a Data Set can have three possible values in a variable in a Binary - Multiple Variable Set: the value that corresponds to a category being applicable or being selected (1), the value that corresponds to it not being selected (0), and a missing value category, which is represented as a NaN in the data.

Nominal - Multi

A set of categorical variables sharing the same scale points, where the scale points are mutually exclusive and unordered.

Which meal did you eat most recently at each of these restaurants?

Breakfast Lunch Dinner
McDonald's o o o
Burger King o o o
Wendy's o o o

Ordinal - Multi

A set of categorical variables sharing the same scale points, where the scale points are mutually exclusive and ordered.

In the vast majority of instances, Ordinal - Multi data is analyzed in the same way as Nominal - Multi data.

How would you rate your satisfaction with your most recent meal at each of these restaurants?

Low Medium High
McDonald's o o o
Burger King o o o
Wendy's o o o

Numeric - Multi

A series of numeric variables measured on the same scale. For example:

Next to the brands below, please indicate how many times you have purchased them in the past week.

Coke ___

Pepsi ___

Fanta ___

Grid

Binary - Grid

This is a generalization of a Binary - Multi Variable Set where the variables can be thought of as being ordered in two dimensions. For example the data generated from a series of related questions such as:

Which of these brands are fun?

[] Coke [] Pepsi [] Fanta

Which of these brands are sexy?

[] Coke [] Pepsi [] Fanta

Which of these brands are masculine?

[] Coke [] Pepsi [] Fanta

Displayr infers the structure of the grid by inspecting the variables' labels at the time of importing the data. Where Displayr cannot discern the structure of the data this can be set when changing the Variable Set structure.

Numeric - Grid

This is generalization of a Numeric - Multi Variable Set, where the variables can be ordered in two dimensions. For example, the data generated by:

In the past month, how many economy flights did you take on...

Qantas ___ United ___ SAS ___

In the past month, how many business class flights did you take on...

Qantas ___ United ___ SAS ___

Displayr infers the structure of the grid by inspecting the variables' labels at the time of importing the data. Where Displayr cannot discern the structure of the data this can be set by changing the Variable Set structure.

Structural dependencies

Binary - Multi (Compact)

The same underlying data as a Binary - Multi Variable Set, except that is stored in a max-multi format. That is, the first variable contains the first response, the second variable contains the second response, etc. This format should only be used to represent multiple response data when there are truly huge code frames (e.g., thousands of options). It is generally inferior to a Nominal structure as it is unwieldy for data manipulation (e.g., for use in formulas) and it cannot accommodate the notion of missing data.

Ranking

Multiple numeric variables that represent a ranking, where the highest number is most preferred and ties are permitted. For example:

Rank the following brands according to how much you like them... Place a 3 next to the brand you like most, a 2 in your next preferred brand and a 1 next to your least preferred brand.

Coke ____

Pepsi ____

Fanta ____

Note that if your question uses lowest numbers as indicating alternatives being more preferred you will need to reverse the values assigned to each rank.

Experiment

This question type is used to represent the various different types of experiments, from randomized experiments (Fully randomized experiments through to Conjoint Analysis and Choice Modeling) (see Experiments in Q).


Which of these would you buy?

Coke Pepsi Fanta
$2.00 $2.10 $1.80
o o o