What question did this study set out to answer?

The aim is to enhance controller synthesis for nonlinear systems by utilizing effective modeling techniques.

February 2, 2026Open Access

Modeling for Data Efficiency: System Identification as a Precursor to Reinforcement Learning for Nonlinear Systems

Key Points

The aim is to enhance controller synthesis for nonlinear systems by utilizing effective modeling techniques.
Studied a nonlinear mass–spring–damper system with complex dynamics.
Developed two data-driven surrogate models for reinforcement learning: a piecewise linear model and a global nonlinear autoregressive model.
Conducted tests on unit step reference tracking using both models.
The piecewise linear model showed lower mean absolute error (MAE = 0.03) compared to the NLARX model (MAE = 0.31).
Identified that improved performance incurs higher identification costs (60,000 samples vs. 12,000 samples).
Established that modeling choices significantly impact controller performance and experimental risk.

Abstract

Safe and sample-conscious controller synthesis for nonlinear dynamics benefits from reinforcement learning that exploits an explicit plant model. A nonlinear mass–spring–damper with hardening effects and hard stops is studied, and off-plant Q-learning is enabled using two data-driven surrogates: (i) a piecewise linear model assembled from operating region transfer function estimates and blended by triangular memberships and (ii) a global nonlinear autoregressive model with exogenous input constructed from past inputs and outputs. In unit step reference tracking on the true plant, the piecewise linear route yields lower error and reduced steady-state bias (MAE = 0.03; SSE = 3%) compared with the NLARX route (MAE = 0.31; SSE = 30%) in the reported configuration. The improved regulation is obtained at a higher identification cost (60,000 samples versus 12,000 samples), reflecting a fidelity–knowledge–data trade-off between localized linearization and global nonlinear regression. All reported performance metrics correspond to deterministic validation runs using fixed surrogate models and trained policies and are intended to support methodological comparison rather than statistical performance characterization. These results indicate that model-based Q-learning with identified surrogates enables off-plant policy training while containing experimental risk and that performance depends on modeling choices, state discretization, and reward shaping.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Nusrat Farheen

Golam Gause Jaman

Marco P. Schoen

Journals

Machines

Actions

Institutions

Idaho State University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Modeling for Data Efficiency: System Identification as a Precursor to Reinforcement Learning for Nonlinear Systems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider