Principal component analysis (PCA) is a statistical technique used to reduce the dimensionality of large datasets while preserving as much important information as possible.It performs a mathematical rotation of the data to create a new set of uncorrelated variables called principal components (PCs), which are ordered by the amount of variance explained in the data and are orthogonal (uncorrelated) to each other.Principal component analysis is used for analysis because such data typically involves thousands of gene expression variables measured across a relatively small number of samples.As each gene represents one dimension, a dataset could potentially represent a 10,000-30,000-dimensional space.Principal component analysis then reduces this high dimensionality by transforming the original gene expression variables into a smaller set of PCs that capture the majority of variation in the data.This helps researchers visualize complex expression patterns, identify clusters of samples, detect outliers, and uncover underlying biological differences between healthy and diseased states.Principal component analysis also helps reduce noise and redundancy in microarray datasets, making downstream statistical analysis and classification more reliable and efficient.In this review, the authors have reviewed current information on the generation of PCs and then subsequent use/analysis of data generated from microarray or similar large data sources.
Maheshwari et al. (Wed,) studied this question.