Tandem Clustering

From Displayr
Jump to: navigation, search

Using a dimension-reduction technique, such as principal components analysis, factor analysis or multiple correspondence analysis, to create new variables and then using cluster analysis or latent class analysis to form segments using the new variables.

Motivations for tandem clustering

A number of different motivations underlie the usage of tandem clustering:

  1. As a way of permitting categorical data to be used in clustering techniques that assume data is numeric. For example, as multiple correspondence analysis converts categories into numeric variables, these can then be analyzed in k-means cluster analysis.
  2. As a form of variable standardization (e.g., allowing analysis to be conducted using a combination of binary variables and 11-point scale variables).
  3. As a way of reducing redundancies in the data. For example, often studies contain lots of variables that are highly correlated (e.g., attitude towards different brands is often highly correlated with usage of the brands) and if this problem is not removed from the data it can result in the segmentation identifying differences based on the number of redundant variables (i.e., the variables that are more closely correlated with the other variables tend to dominate the segmentation).

Appropriateness of tandem clustering

The key justification for tandem clustering is that it is a relatively quick and easy way of achieving of addressing each of these motivations.[1] While this justification is still valid in some instances, each of the motivations for tandem clustering is questionable:

  1. Historically, tandem clustering was the only practical solution to combining numeric and categorical data. However, modern cluster analysis methods (e.g., SPSS Two Step Clustering) and latent class methods are able to automatically accommodate combinations of different data types, so this justification for the use of tandem clustering is no longer applicable.
  2. Variable standardization is not, and never has been, a good justification for tandem clustering, as variable standardization often undermines the objectives of segmentation studies (see Variable Standardization).
  3. The issue of whether removing redundancies is an appropriate motivation is more difficult. Dimension reduction involves two qualitatively distinct transformations of the data: (1) the most highly correlated variables are combined into dimensions and thus they are down-weighted relative to less important variables; and (2) the least important variables are excluded from further analysis because the dimensions that they are correlated with are usually determined to be immaterial (e.g., have eigenvalues less than 1). Thus, the tandem clustering has the net effect of causing variables with moderate levels of correlation with other variables to be given relatively greater prominence in the segmentation than would otherwise occur.
If clustering a single set of variables (e.g., ratings of respondents opinions on different topics), it would appear to be unambiguously dangerous to apply tandem clustering as the way that it deals with the redundancies is at odds with the goal of segmentation. That is, in this situation if there are a set of variables that are highly correlated then it is these variables that should be central to the segmentation. Similarly, the variables that are excluded could contain information that is relevant to the formation of the segments.[2]
Where the segmentation is being conducted with lots of different types of data (e.g., brand preference, behavior, importance ratings), a failure to address the redundancies can result in uninteresting segmentations, so in this situation tandem clustering is more defensible (particularly if time constraints prevent a more considered method of addressing the problem of redundancies).

An additional problem with tandem clustering is that typically tandem clustering is used in conjunction with k-means cluster analysis, which is generally inferior to latent class analysis (see The Relationship Between Cluster Analysis, Latent Class Analysis and Self-Organizing Maps).

An alternative to tandem clustering

It is possible to get the same basic benefits of tandem clustering without any of its limitations as follows:

  • Use latent class analysis.
  • If a set of variables are playing too great or too little a role in the segmentation, weight these variables to change their relative importance. In Q, for example, this is done by clicking Advanced from within the Segments dialog box and modifying the Weight (larger values increase the importance of the set of questions).
  • If wanting to modify the variance of some numeric variables relative to other numeric variables, recode them to change change their range (i.e., variables with a larger range have a bigger impact on the segmentation, all else being equal).

See also

References

  1. Kotler, P. (1997). Marketing Management: Analysis, Planning, Implementation, and Control. Upper Saddle River, New Jersey, Prentice Hall International, Inc.
  2. Green, P. E. and A. M. Krieger (1995). "Alternative approaches to cluster-based market segmentation." Journal of the Market Research Society 37(3): 231-239.