Key points are not available for this paper at this time.
In problem of sparse principal components analysis (SPCA), the goal is to use n i.i.d. samples to estimate the leading eigenvector(s) of a p times p covariance matrix, which are known a priori to be sparse, say with at most k non-zero entries. This paper studies SPCA in the high-dimensional regime, where the model dimension p, sparsity index k, and sample size n all tend to infinity simultaneously. We first analyze a simple and computationally inexpensive diagonal cut-off method, and establish a threshold of the order thetas diag = n/k 2 log(p-k) separating success from failure. We then analyze a more complex semidefinite programming (SDP) relaxation due to dpsilaAspremont et al., and prove that it succeeds once the sample size is of the order thetas sdp = n/k log(p - k). Our results thus highlight an interesting tradeoff between statistical and computational efficiency in high-dimensional estimation problems.
Amini et al. (Tue,) studied this question.