Electric vehicles (EVs) represent a significant shift in transportation and energy consumption, necessitating efficient power management and grid stabilization. Voltage Source Inverters (VSIs) play a crucial role in three-phase charging infrastructure by facilitating bidirectional power exchange for Grid-to-Vehicle (G2V) and Vehicle-to-Grid (V2G) applications. This paper presents a Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning controller designed to regulate real and reactive power flow in a grid-connected EV system. By modelling the control problem as a Markov Decision Process (MDP), the proposed dynamic controller learns optimal switching actions to manage voltage and current, enabling effective four-quadrant operation. Experimental results demonstrate that this innovative approach significantly outperforms the conventional Proportional–Integral (PI) controller. While the PI controller exhibits approximately 4% error in power tracking, the TD3 agent reduces active and reactive power tracking errors to 1% and 1.02% respectively, representing an improvement of 96.4% during G2V operation and 95.1% during V2G operation. Furthermore, the system maintains power quality within acceptable limits, achieving a grid-current Total Harmonic Distortion (THD) of 4.63% in G2V mode and below 5% in V2G mode. These findings confirm that the TD3-based VSI controller enables stable, accurate, and intelligent bidirectional power control, maximising the active participation of EVs in future smart grid management. • A comprehensive TD3-based is developed for three-phase EV G2V/V2G power flow management. • The proposed model demonstrates full four-quadrant operation (P–Q plane) with enhanced tracking accuracy and robustness. • Unlike classical PI and Model Free Predictive Controller approaches (MF-PC), the con Troller maintains stability under grid disturbances such as voltage sag, swell, and harmonic injection. • A complete simulation framework is developed, including converter modelling, reward shaping, RL training environment, and deployment pipeline. • The study incorporates C-rate–dependent EV battery stress analysis, which is rarely considered in EV-grid interaction research. • Extensive comparative analysis shows that the proposed controller reduces steady-state error to below 1% and maintains THD within IEEE-519 standards.
S. et al. (Fri,) studied this question.