Abstract Principal Component Analysis (PCA) is one of the most widely used approaches for multivariate datasets. Biologists use PCA to visualize data, identify patterns in large datasets, determine independent axes of variation, and reduce dimensionality for further statistical analyses. Phylogenetic PCA is an extension of regular PCA that seeks to identify the major axes of variation independent of the phylogeny. We extend these methods by estimating PCA parameters using an explicit probability modeling framework. We implement multiple models of trait evolution (Brownian motion, Ornstein-Uhlenbeck, Early Burst, and Pagel’s λ) and use the Akaike Information Criterion (AIC) for model selection. We also introduce a probabilistic approach to select the number of principal components to retain from a PCA. We demonstrate the advantages of probabilistic PCA, such as incorporating the error, or noise, arising from dimensionality reduction, which is ignored in regular PCA. We use extensive simulations and an empirical dataset with 35 traits to show the method’s performance. We implemented the new approach in the R package “do3PCA” available from the RCran repository.
Caetano et al. (Sat,) studied this question.