Scree plot

Diagnostic plot
A sample scree plot produced in R. The Kaiser criterion is shown in red.

In multivariate statistics, a scree plot is a line plot of the eigenvalues of factors or principal components in an analysis.[1] The scree plot is used to determine the number of factors to retain in an exploratory factor analysis (FA) or principal components to keep in a principal component analysis (PCA). The procedure of finding statistically significant factors or components using a scree plot is also known as a scree test. Raymond B. Cattell introduced the scree plot in 1966.[2]

A scree plot always displays the eigenvalues in a downward curve, ordering the eigenvalues from largest to smallest. According to the scree test, the "elbow" of the graph where the eigenvalues seem to level off is found and factors or components to the left of this point should be retained as significant.[3]

Etymology

The scree plot is named after the elbow's resemblance to a scree in nature.

Criticism

This test is sometimes criticized for its subjectivity. Scree plots can have multiple "elbows" that make it difficult to know the correct number of factors or components to retain, making the test unreliable. There is also no standard for the scaling of the x and y axes, which means that different statistical programs can produce different plots from the same data.[4]

The test has also been criticized for producing too few factors or components for factor retention.[clarification needed][1]

As the "elbow" point has been defined as point of maximum curvature, as maximum curvature captures the leveling off effect operators use to identify knees, this has led to the creation of a Kneedle algorithm.[5]

See also

References

Wikimedia Commons has media related to Scree plot.
  1. ^ a b George Thomas Lewith; Wayne B. Jonas; Harald Walach (23 November 2010). Clinical Research in Complementary Therapies: Principles, Problems and Solutions. Elsevier Health Sciences. p. 354. ISBN 978-0-7020-4916-3.
  2. ^ Cattell, Raymond B. (1966). "The Scree Test For The Number Of Factors". Multivariate Behavioral Research. 1 (2): 245–276. doi:10.1207/s15327906mbr0102_10. PMID 26828106.
  3. ^ Alex Dmitrienko; Christy Chuang-Stein; Ralph B. D'Agostino (2007). Pharmaceutical Statistics Using SAS: A Practical Guide. SAS Institute. p. 380. ISBN 978-1-59994-357-2.
  4. ^ Norman, Geoffrey R.; Streiner, David L. (15 September 2007). Biostatistics: The bare essentials. PMPH-USA. p. 201. ISBN 978-1-55009-400-8 – via Google Books.
  5. ^ Satopaa, Ville; Albrecht, Jeannie; Irwin, David; Raghavan, Barath (2011-06-20). Finding a "kneedle" in a haystack: Detecting knee points in system behavior. 2011 / 31st International Conference on Distributed Computing Systems. IEEE Workshops. Institute of Electrical and Electronics Engineers. pp. 166–171. doi:10.1109/ICDCSW.2011.20 – via IEEE Explore.
  • v
  • t
  • e
Statistics
Descriptive statistics
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
  • Z-test (normal)
  • Student's t-test
  • F-test
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
  • Category
  • icon Mathematics portal
  • Commons
  • WikiProject