Key points are not available for this paper at this time.
We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution D over Rⁿ and a collection of data points in Rⁿ, distinguish between the two possibilities that (i) the data was drawn from D, versus (ii) the data was drawn from D|S, i. e. from D subject to truncation by an unknown truncation set S Rⁿ. We study this problem in the setting where D is a high-dimensional i. i. d. product distribution and S is an unknown degree-d polynomial threshold function (one of the most well-studied types of Boolean-valued function over Rⁿ). Our main results are an efficient algorithm when D is a hypercontractive distribution, and a matching lower bound: For any constant d, we give a polynomial-time algorithm which successfully distinguishes D from D|S using O (n^d/2) samples (subject to mild technical conditions on D and S) ; Even for the simplest case of D being the uniform distribution over \+1, -1\ⁿ, we show that for any constant d, any distinguishing algorithm for degree-d polynomial threshold functions must use (n^d/2) samples.
De et al. (Mon,) studied this question.