Key points are not available for this paper at this time.
This paper studies the problem of shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we consider the model y = _* X _* + w, where X is an n d standard Gaussian design matrix, w is Gaussian noise with entrywise variance ², _* is an unknown n n permutation matrix, and _* is the regression coefficient, also unknown. Previous work has shown that, in the large n-limit, the minimal signal-to-noise ratio (SNR), _* ²/², for recovering the unknown permutation exactly with high probability is between n² and nC for some absolute constant C and the sharp threshold is unknown even for d=1. We show that this threshold is precisely SNR = n⁴ for exact recovery throughout the sublinear regime d=o (n). As a by-product of our analysis, we also determine the sharp threshold of almost exact recovery to be SNR = n², where all but a vanishing fraction of the permutation is reconstructed.
Lufkin et al. (Wed,) studied this question.