Key points are not available for this paper at this time.
A discussion is presented of three techniques which offer significant improvement in training time. In the first, training is restricted to those samples for which the network fails to predict correctly. The training process is extended to the entire training data set as the performance of the network improves. In the second technique, an acceleration process is used for neurons which produce the same output class for the inputs provided by the training sample. In the third technique, the learning rate is optimized, on the fly, to get the optimal improvement for each training pass. A derivation is presented for an optimal matching of momentum and learning rate
Allred et al. (Mon,) studied this question.