Key points are not available for this paper at this time.
With very noisy data, overfitting is a serious problem in pattern recognition. For nonlinear regression, having plentiful data eliminates overfitting, but for nonlinear principal component analysis (NLPCA), overfitting persists even with plentiful data. Thus simply minimizing mean square error is not a sufficient criterion for NLPCA to find good solutions in noisy data. A new index is proposed which measures the disparity between the nonlinear principal components u and u macr for a data point x and its nearest neighbour x macr. This index, 1middotC S (the Spearman rank correlation between u and u macr), tends to increase with overfitted solutions, thereby providing a diagnostic tool to determine how much regularization (i.e. weight penalty) should be used in the objective function of the NLPCA to prevent overfitting. Tests are performed using autoassociative neural networks for NLPCA on synthetic and real climate data.
William W. Hsieh (Sun,) studied this question.