# Category:Weights

A weight variable, which is often simply referred to as the weights, is a variable used when weighting data during analysis.

In most situations, when people refer to weights they are referring to sampling weights. However, there are other types of weights.

# Sampling weights

Most commonly, weights reflect different probabilities of selection for inclusion in the sample. For example, if a sample contains 70% women but it is known that the population should only contain 51%, the data can be weighted to reflect this. Such weights are known as sampling weights and probability weights. The process of creating a weight to reflect different probabilities of selection for inclusion in a sample is sometimes referred to as sample balancing.

Sampling weights are computed to be proportional to the inverse of the probability of selection. For example, if one respondent has a weight of 2 and another has a weight of 1, this means that the person with a weight of 2 had only half the chance of being selected for the survey as the other had.

# Frequency weights

Sometimes data files have frequency weights (also sometimes known as replication weights). For example, if two people gave the same answer in a survey, the data file may only contain the data for one of these people with a frequency weight being used to indicate that the data should be counted as two people (i.e., the weight would have a value of 2). Please note: if weighting data by profit, customer value, frequency or consumption (i.e., volumetric analysis) the weights are likely sampling weights, not frequency weights.

# Variance weights

Typically (but not always), this is just another name for frequency weights.

# Replicate weights

A replicate weight is a special type of sampling weight developed for protecting the privacy of individuals in surveys. Replicate weights are rarely used in practice and then virtually only by government statistical agencies (this rareness of use is due to the complexity of using replicate weights). Other than noting that special-purpose software is required to analyze data containing replicate weights (e.g., Westat, Stata), they are not discussed further.

Note that replication weights are different to replicate weights (replication weights are frequency weights).

# Expansion weights

An expansion weight is used so that the weighted data reflects the size of the population of interest (e.g., if there are 300 million people in the population, but 300 people in the survey, a weight of 1,000,000 may be assigned to each respondent).

Expansion weights are also referred to as expansion factors, inflation factors and grossing-up factors. They are typically multiplied with sampling weights to create a single weight. This approach works when the analysis software can accommodate sampling weights but does not work if the analysis program treats weights as frequency weights (because the consequence of the expansion is to inflate the sample size and render all statistical tests invalid).

# Analytic weights

These are weights which reflect the different levels of precision of different observations. For example, if analyzing data where each observation is the average result from a geographic area, the analytic weight is proportional to the inverse of the estimated variance. Analytic weights are quite obscure in survey analysis and are not discussed further.

# Importance weights

These are weights that reflect the importance of observations in the sample. In commercial research, most commonly importance weights are things like profitability, household size or typical consumption levels. Typically importance weights can, and are, treated as being sampling weights.