Los puntos clave no están disponibles para este artículo en este momento.
The spiked covariance model has gained increasing popularity in high-dimensional data analysis. A fundamental problem is determination of the number of spiked eigenvalues, K . For estimation of K , most attention has focused on the use of top eigenvalues of sample covariance matrix, and there is little investigation into proper ways of utilizing bulk eigenvalues to estimate K . We propose a principled approach to incorporating bulk eigenvalues in the estimation of K . Our method imposes a working model on the residual covariance matrix, which is assumed to be a diagonal matrix whose entries are drawn from a gamma distribution. Under this model, the bulk eigenvalues are asymptotically close to the quantiles of a fixed parametric distribution. This motivates us to propose a two-step method: the first step uses bulk eigenvalues to estimate parameters of this distribution, and the second step leverages these parameters to assist the estimation of K . The resulting estimator K ˆ aggregates information in a large number of bulk eigenvalues. We show the consistency of K ˆ under a standard spiked covariance model. We also propose a confidence interval estimate for K . Our extensive simulation studies show that the proposed method is robust and outperforms the existing methods in a range of scenarios. We apply the proposed method to analysis of a lung cancer microarray data set and the 1000 Genomes data set.
Ke et al. (Mon,) studied this question.