The problem of reinforcement learning (RL)-based fuzzy control for nonlinear systems with unknown dynamics via parallel composite policy iteration (PCPI) scheme is studied in this article. The main objective of this article is to solve the fuzzy algebraic Riccati equation (FARE), which is inherently complex and cannot be easily solved by traditional mathematical formulas. Policy iteration (PI) and value iteration (VI) algorithms proposed have been widely used to address this problem. However, these algorithms have the disadvantages of an initial stabilizing control policy, the persistent excitation (PE) condition, and huge amounts of data. To effectively alleviate these drawbacks, a novel PCPI algorithm is proposed in this article. Specifically, for each fuzzy subsystem, an adaptive parameter is designed to eliminate the requirement of an initial stabilizing control policy. In addition, an online model-free PCPI algorithm is proposed for the situation where the dynamic information of the fuzzy system is difficult to obtain. By substituting the stored historical data with online data, the PE condition is relaxed to the initial excitation (IE) condition. Concurrently, the corresponding algorithm can be executed independently and concurrently under each fuzzy rule, thereby fully exploiting the available computational resources. Finally, the effectiveness of the algorithms set forth in this article is verified through a single-link robot arm and quarter-car active suspension (QCAS) experiment.
Liu et al. (Thu,) studied this question.