This article proposes a novel discounted inverse reinforcement learning (DIRL) algorithm for linear quadratic (LQ) control of unknown continuous-time (CT) systems with partially observable states and an unknown discounted value function. Existing DIRL methods predominantly rely on full-state feedback, limiting their applicability to practical scenarios where only input-output data are available. To this end, a state reconstruction method is designed for the system controlled by an expert using the measured desired output. Based on this, a model-free output-feedback (OPFB) DIRL algorithm is presented to iteratively solve the unknown value function and the corresponding optimal OPFB control policy equivalent to the expert control policy. The convergence of the proposed algorithm and the nonuniqueness of solutions are rigorously analyzed. Finally, comprehensive simulations reveal the effectiveness of the proposed algorithm in recovering the expert control policy and its superior computational efficiency compared to state-of-the-art (SOTA) methods.
Wu et al. (Thu,) studied this question.