Key points are not available for this paper at this time.
The spherical perceptron with N inputs and a linear output does not present optimal generalization if trained by minimization of the standard quadratic cost function E=1/2 J=₁^ (b_-h_) ^2, where b_ and h_ are the outputs from the rule (teacher) and hypothesis (student) networks for the example and there are examples. We derive an optimal algorithm for on-line learning of examples which outperforms the iterative (off-line) standard algorithm for up to 0. 71. The on-line optimized algorithm suggests a class of cost functions for off-line learning, which we then proceed to study using the replica method. The optimized cost function within that class has the suggestive form E= (1/) J=₁^ [-lnP (b_h_) - lnZ], where Z is a normalization constant, P (b_h_) is the conditional probability of the output data b_ given the hypothesis output h_, and is a learning parameter analogous to a temperature which decreases in a well defined manner along the learning process.
Kinouchi et al. (Fri,) studied this question.