Principal Components and Factor Analysis in R

1. Objective

In this R tutorial, we will learn what does exactly Principal components and Factor Analysis in R means. After this, we will move forward to learn its components, principal, and functions.
Principal Components and Factor Analysis in R

Principal Components and Factor Analysis in R

2. Introduction to Principal components and Factor Analysis in R

We use R principal component and factor analysis as the multivariate analysis method. The aim of this is to reveal systematic covariations among a group of variables. Also, the analysis can be motivated in many different ways. It includes describing the basic anomaly patterns that appear in spatial data sets.

Thus, it is always performed on a symmetric correlation or covariance matrix. Hence, it means the matrix should be numeric.

3. What are Principal components in R?

It is a normalized linear combination of the original predictors in a data set. We can write the principal component in following way:
Z¹ = Φ¹¹X¹ + Φ²¹X² + Φ³¹X³ + …. +Φp¹Xp
Z¹ is first principal component
Φp¹ is the loading vector comprising of loadings (Φ¹, Φ²..) of a first principal component. Also, the loadings are constrained to a sum of square equals to 1. This is because the large size of loadings may lead to large variance. It also defines the direction of the principal component (Z¹) along which data varies the most. Moreover, it results in a line in p dimensional space which is closest to the n observations. We can measure closeness using average squared Euclidean distance.
X¹..Xp is normalized predictors. Normalized predictors have mean equals to zero and standard deviation equals to one.

4. Why Use Principal Components Analysis?

The main aim of principal components analysis is to report hidden structure in a data set. In doing so, we may be able to do following things:
a. Basically, it is prior to identifying how different variables work together to create the dynamics of the system.
b. Then reduce the dimensionality of the data.
c. Afterwards, it decreases redundancy in the data.
d. Filter some of the noise in the data.
e. Then compress the data.
f. Moreover, prepare the data for further analysis using other techniques.

5. Functions to do principal analysis in R

a. prcomp() (stats)
b. princomp() (stats)
c. PCA() (FactoMineR)
d. dudi.pca() (ade4)
eacp() (amap)

6. Methods for Principal Component Analysis in R

There are two methods for R Principal component analysis:

a. Spectral decomposition

It examines the covariances/correlations between variables.

b. Singular value decomposition

It examines the covariances/correlations between individuals. We use the function princomp() for the spectral approach. And we can also use the functions prcomp() and PCA() in the singular value decomposition.

7. prcomp() and princomp() functions

The simplified format of these 2 functions are :
prcomp(x, scale = FALSE)
princomp(x, cor = FALSE, scores = TRUE)
Arguments for prcomp()
a. x: a numeric matrix or data frame.
b. scale: It is a logical value. It indicates whether the variables should be scaled to have unit variance. It will take place before the analysis takes place.
Arguments for princomp()
a. x: a numeric matrix or data frame.
b. cor: a logical value. If TRUE, then data will be centered and also scaled before the analysis.
c. scores: a logical value. If TRUE, then coordinates on each principal component are calculated.

R Quiz

8. Package for PCA visualization

We’ll use the facto extra R package to create a ggplot2-based elegant visualization.
You can install it from CRAN:
Or, install the latest developmental version from github:
if(!require(devtools)) install.packages(“devtools”)
Load fact extra as follow:

9. Conclusion

We have studied the principal component and factor analysis in R. Along with this, we have discussed its usage, functions, components. After learning all this we have also discussed what is a package for PCA visualization.
Hope you enjoyed the learning!!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.