July 1, 2008

High-dimensional analysis of semidefinite relaxations for sparse principal components

Key Points

Key points are not available for this paper at this time.

Abstract

In problem of sparse principal components analysis (SPCA), the goal is to use n i.i.d. samples to estimate the leading eigenvector(s) of a p times p covariance matrix, which are known a priori to be sparse, say with at most k non-zero entries. This paper studies SPCA in the high-dimensional regime, where the model dimension p, sparsity index k, and sample size n all tend to infinity simultaneously. We first analyze a simple and computationally inexpensive diagonal cut-off method, and establish a threshold of the order thetas diag = n/k 2 log(p-k) separating success from failure. We then analyze a more complex semidefinite programming (SDP) relaxation due to dpsilaAspremont et al., and prove that it succeeds once the sample size is of the order thetas sdp = n/k log(p - k). Our results thus highlight an interesting tradeoff between statistical and computational efficiency in high-dimensional estimation problems.

High-dimensional analysis of semidefinite relaxations for sparse principal components

Key Points

Abstract

Cite This Study