Contents

- 1. Objective
- 2. Introduction to Principal components and Factor Analysis in R
- 3. What are Principal components in R?
- 4. Why Use Principal Components Analysis?
- 5. Functions to do principal analysis in R
- 6. Methods for Principal Component Analysis in R
- 7. prcomp() and princomp() functions
- 8. Package for PCA visualization
- 9. Conclusion

## 1. Objective

## 2. Introduction to Principal components and Factor Analysis in R

Thus, it is always performed on a symmetric correlation or covariance matrix. Hence, it means the matrix should be numeric.

## 3. What are Principal components in R?

It is a normalized linear combination of the original predictors in a data set. We can write the principal component in following way:

Z¹ = Φ¹¹X¹ + Φ²¹X² + Φ³¹X³ + …. +Φp¹Xp

Where,

Z¹ is first principal component

Φp¹ is the loading vector comprising of loadings (Φ¹, Φ²..) of a first principal component. Also, the loadings are constrained to a sum of square equals to 1. This is because the large size of loadings may lead to large variance. It also defines the direction of the principal component (Z¹) along which data varies the most. Moreover, it results in a line in p dimensional space which is closest to the n observations. We can measure closeness using average squared Euclidean distance.

X¹..Xp is normalized predictors. Normalized predictors have mean equals to zero and standard deviation equals to one.

## 4. Why Use Principal Components Analysis?

The main aim of principal components analysis is to report hidden structure in a data set. In doing so, we may be able to do following things:

a. Basically, it is prior to identifying how different variables work together to create the dynamics of the system.

b. Then reduce the dimensionality of the data.

c. Afterwards, it decreases redundancy in the data.

d. Filter some of the noise in the data.

e. Then compress the data.

f. Moreover, prepare the data for further analysis using other techniques.

## 5. Functions to do principal analysis in R

a. prcomp() (stats)

b. princomp() (stats)

c. PCA() (FactoMineR)

d. dudi.pca() (ade4)

e**. **acp() (amap)

## 6. Methods for Principal Component Analysis in R

There are two methods for R Principal component analysis:

### a. Spectral decomposition

It examines the covariances/correlations between variables.

### b. Singular value decomposition

It examines the covariances/correlations between individuals. We use the function princomp() for the spectral approach. And we can also use the functions prcomp() and PCA() in the singular value decomposition.

## 7. prcomp() and princomp() functions

The simplified format of these 2 functions are :

prcomp(x, scale = FALSE)

princomp(x, cor = FALSE, scores = TRUE)

**Arguments for **prcomp**()**

**a. x**: a numeric matrix or data frame.

**b. scale**: It is a logical value. It indicates whether the variables should be scaled to have unit variance. It will take place before the analysis takes place.

**Arguments for princomp()**

**a. x**: a numeric matrix or data frame.

**b. cor**: a logical value. If TRUE, then data will be centered and also scaled before the analysis.

**c. scores**: a logical value. If TRUE, then coordinates on each principal component are calculated.

## 8. Package for PCA visualization

We’ll use the facto extra R package to create a ggplot2-based elegant visualization.

**You can install it from CRAN**:

install.packages(“factoextra”)

Or, install the latest developmental version from github:

if(!require(devtools)) install.packages(“devtools”)

devtools::install_github(“kassambara/factoextra”)

**Load fact extra as follow**:

library(factoextra)

## 9. Conclusion

We have studied the principal component and factor analysis in R. Along with this, we have discussed its usage, functions, components. After learning all this we have also discussed what is a package for PCA visualization.

Hope you enjoyed the learning!!