Key points are not available for this paper at this time.
A class of learning tasks is described that combines aspects of learning automation tasks and supervised learning pattern-classification tasks. These tasks are called associative reinforcement learning tasks. An algorithm is presented, called the associative reward-penalty, or A R-P algorithm for which a form of optimal performance is proved. This algorithm simultaneously generalizes a class of stochastic learning automata and a class of supervised learning pattern-classification methods related to the Robbins-Monro stochastic approximation procedure. The relevance of this hybrid algorithm is discussed with respect to the collective behaviour of learning automata and the behaviour of networks of pattern-classifying adaptive elements. Simulation results are presented that illustrate the associative reinforcement learning task and the performance of the A R-P algorithm as compared with that of several existing algorithms.
Barto et al. (Wed,) studied this question.