Key points are not available for this paper at this time.
The authors focus on the minimal network strategy. The underlying hypothesis is that if several nets fit the data equally well, the simplest one will on average provide the best generalization. Inspired by the information theoretic idea of minimum description length, a term is added to the backpropagation cost function that penalizes network complexity. The authors give the details of the procedure, called weight-elimination, describe its dynamics, and clarify the meaning of the parameters involved. From a Bayesian perspective, the complexity term can be usefully interpreted as an assumption about prior distribution of the weights. This procedure was used to predict currency exchange rates.>
Weigend et al. (Tue,) studied this question.