https://docs.displayr.com/index.php?title=Category:Principal_Components_Analysis&feed=atom&action=historyCategory:Principal Components Analysis - Revision history2017-11-18T08:20:18ZRevision history for this page on the wikiMediaWiki 1.29.2https://docs.displayr.com/index.php?title=Category:Principal_Components_Analysis&diff=5713&oldid=prevTim Bock: Created page with "Principal components analysis identifies interrelationships between variables. In Displayr: '''Insert > More (Analysis) > Dimension Reduction > Principal Components Analysis''..."2017-02-28T05:51:56Z<p>Created page with "Principal components analysis identifies interrelationships between variables. In Displayr: '''Insert > More (Analysis) > Dimension Reduction > Principal Components Analysis''..."</p>
<p><b>New page</b></p><div>Principal components analysis identifies interrelationships between variables. In Displayr: '''Insert > More (Analysis) > Dimension Reduction > Principal Components Analysis'''.<br />
<br />
== Applications of Principal Components Analysis in Survey Analysis == <br />
<br />
* Understand how attitudes and/or behaviors are interrelated.<br />
* [[Identify Redundant Questions in a Questionnaire]]<br />
* Checking [[Multi-Item Scale]]s (i.e., if the scale has been developed to measure two abstract dimensions using ten variables then principal components analysis should recover these same two dimensions.<br />
* [[Identify Redundant Concepts in New Product Testing]]<br />
* Summarize data.<br />
* [[Transformations|Transform]] data prior to the application of other multivariate techniques (e.g., cluster analysis: see [[Data Preparation for Cluster-Based Segmentation]] or [[regression]]).<br />
<br />
== An example ==<br />
<br />
The following [[Correlation Matrix|correlation matrix]] shows correlations between viewing of a number of different television programs in Britain.<ref>Ehrenberg, Andrew S. C. 1981. The Problem of Numeracy. The American Statistician 35 (May):67-70.</ref> If you inspect the table you will see it reveals some patterns:<br />
* People who watch any one of the sports programs are more likely to watch one of the other sports programs. <br />
* People who watch one current affairs program are more likely to watch another, and vice versa.<br />
<br />
{| class="wikitable" style="text-align:center"<br />
|-<br />
! !! Professional<br>Boxing !! This<br>Week !! Today !! World of<br>Sport !! Grandstand !! Line-Up !! Match of<br>the Day !! Panorama !! Rugby<br>Special !! 24 Hours<br />
|-<br />
| align='right'| '''World of Sport'''|| 1.0 || .6 || .6 || .5 || .3 || .2 || .1 || .1 || .1 || .1 <br />
|-<br />
| align='right'| '''Grandstand'''|| .6 || 1.0 || .6 || .5 || .3 || .2 || .1 || .1 || .1 || .1 <br />
|-<br />
| align='right'| '''Match of the Day'''|| .6 || .6 || 1.0 || .5 || .3 || .1 || .1 || .0 || .0 || .1 <br />
|-<br />
| align='right'| '''Professional Boxing'''|| .5 || .5 || .5 || 1.0 || .3 || .2 || .1 || .1 || .1 || .1 <br />
|-<br />
| align='right'| '''Rugby Special'''|| .3 || .3 || .3 || .3 || 1.0 || .1 || .1 || .1 || .1 || .1 <br />
|-<br />
| align='right'| '''Panorama'''|| .2 || .2 || .1 || .2 || .1 || 1.0 || .5 || .2 || .2 || .4 <br />
|-<br />
| align='right'| '''24 Hours'''|| .1 || .1 || .1 || .1 || .1 || .5 || 1.0 || .3 || .2 || .4 <br />
|-<br />
| align='right'| '''Line-Up'''|| .1 || .1 || .0 || .1 || .1 || .2 || .3 || 1.0 || .2 || .2 <br />
|-<br />
| align='right'| '''Today|| .1 || .1 || .0 || .1 || .1 || .2 || .2 || .2 || 1.0 || .3 <br />
|-<br />
| align='right'| This Week|| .1 || .1 || .1 || .1 || .1 || .4 || .4 || .2 || .3 || 1.0 <br />
|}<br />
<br />
Where a set of variables are correlated with each other, a plausible explanation is that there is some other variable that they are all correlated with. For example, it may be that the reason that viewership of each of the sports programs is correlated with each other is that they are all correlated with a more general variable, propensity to watch sports programs. Similarly, the ''factor'' that might explain the correlation amongst viewership of the current affairs program may be that people differ in terms of their propensity to view current affairs programs. Principal components analysis is a statistical technique that attempts to uncover such factors (also known as ''components'').<br />
<br />
If we assume that some factors exist and underlie the data, various [[algorithms]] have been developed which seek to compute the underlying factors based on the available data. Principal components analysis is the most widely used of these algorithms. The following output has been generated in SPSS using a [[Varimax Rotation]] ([[Principal Components Example Syntax|click here]] for the [[Syntax|syntax]]).<br />
<br />
=== Communalities ===<br />
<br />
The ''communalities'' are computations of the extent to which a variable is explained by the components. Note that ''Today'' has the lowest communality, which indicates that viewing of the ''Today'' program is less well explained by the analysis than any of the other programs (increasing the number of factors increases the communality of all the variables).<br />
<br />
[[File:SPSSCommunalities.png]]<br />
<br />
=== Total Variance Explained ===<br />
<br />
The three right-most columns of ''Total Variance Explained'' contain the most important information on this table, and are interpreted as follows:<br />
* Two factors (i.e., components) have been saved. That is, the analysis assumes that the 10 original variables can be reduced to 2 underlying factors. (The number of components selected has been determined by the [[Kaiser Rule]].)<br />
* The two components explain 51% of the variance in the data. That is, when it is assumed that there are two components, we can predict 51% of the information in all the 10 variables. (By chance, we would expect to be able to predict 2/10=20%.)<br />
* The first component explains more of the variance than the second component (29% versus 22%).<br />
<br />
<br />
[[File:SPSSVarianceExplained.png]]<br />
<br />
=== Rotated Component Matrix ===<br />
<br />
The ''rotated component matrix'', sometimes referred to as the ''loadings'', is the key output of principal components analysis. It contains estimates of the correlations between each of the variables and the estimated components. In this example:<br />
* There are moderate-to-strong [[Correlation|correlations]] between the five sports programs and component 1. <br />
* The correlations between the current affairs programs and the first component are very low. Typically, when interpreting a ''component matrix'',. correlations of less than 0.3 or 0.4 are regarded as being trivial. (These correlations are commonly referred to as ''loadings''; the correlations can also be negative and in such an instance correlations of between -0.4 or -0.3 and 0.0 are regarded as being trivially small.)<br />
* Thus, the first component seems to measure propensity to watch sports programs.<br />
* There are moderate-to-strong correlations between the five current affairs programs and the second component and low correlations between the sports programs and this component. Thus, the second component seems to measure propensity to watch current affairs programs. <br />
<br />
[[File:SPSSRotatedCompentMatrix.png]]<br />
<br />
== See also ==<br />
<br />
* [[The Basic Mechanics of Principal Components Analysis]]<br />
* [[Saved Principal Components Analysis Variables]]<br />
* [[Common Misinterpretations of Principal Components Analysis]]<br />
<br />
== Also known as ==<br />
<br />
[[Factor Analysis]] (technically this is a different method, but most people that say "factor analysis" means principal components analysis).<br />
<br />
== References ==<br />
<br />
{{reflist}}<br />
<br />
[[Category:Definitions]]<br />
[[Category:Displayr]]</div>Tim Bock