What question did this study set out to answer?

This research aims to develop and compare two methodologies for optimizing energy trading in grid-connected photovoltaic-battery systems.

May 27, 2026Open Access

Bidirectional Long Short-Term Memory-Driven Control for Grid-Connected Photovoltaic-Battery Energy Trading Systems: Mixed-Integer Linear Programming Optimization and Online Deep Reinforcement Learning

Key Points

This research aims to develop and compare two methodologies for optimizing energy trading in grid-connected photovoltaic-battery systems.
Developed a mixed-integer linear programming optimizer using price forecasts for daily scheduling.
Implemented an online twin delayed deep deterministic policy gradient controller for dynamic scheduling.
Evaluated models using historical data from 2019-2022 and tested against 2024 data.
BiLSTM-MILP framework achieved EUR 10,928.7 revenue, 82.67% of the day-ahead omniscience benchmark.
Online BiLSTM-TD3 controller generated EUR 10,884.9, 82.34% of the benchmark and 99.60% of BiLSTM-MILP revenue.
Online methods outperformed a rule-based baseline by 34.9%.

Abstract

This paper presents two forecast-driven energy trading methodologies for a grid-connected photovoltaic-battery system participating in the day-ahead electricity market. Both methodologies use bidirectional long short-term memory neural networks with attention to forecast electricity prices, but they differ in the way the resulting forecasts are converted into operational decisions. The first method uses 24- to 48 h-ahead price forecasts within a mixed-integer linear programming rolling-horizon optimizer to compute the revenue-maximizing schedule for the following day. The second method uses an online twin delayed deep deterministic policy gradient controller that outputs a complete 24 h charge–discharge schedule once per day, using state information that includes battery state, recent price history, forecast prices, and forecast photovoltaic production. The control models are trained using historical data from 2019 to 2022, validated chronologically on 2023 data, and tested on the 2024 annual horizon, while the price forecaster is trained and validated on non-2024 data and evaluated on the held-out 2024 test period. In the realistic execution setting, schedules are planned using forecast photovoltaic production and implemented against actual photovoltaic production, while the day-ahead omniscience benchmark uses actual next-day prices and actual PV production as ideal scheduling inputs. The BiLSTM-MILP framework achieves EUR 10,928.7 over the 2024 test horizon, corresponding to 82.67% of the day-ahead omniscience benchmark. The online BiLSTM-TD3 controller achieves EUR 10,884.9, corresponding to 82.34% of the same benchmark and 99.60% of the BiLSTM-MILP revenue, while outperforming a rule-based baseline by 34.9%. These results show that online deep reinforcement learning can approach the performance of explicit mathematical optimization in day-ahead PV-battery trading while substantially improving over simple rule-based operation. Overall, the results indicate that BiLSTM-based forecasts can support both optimization-based and reinforcement-learning-based day-ahead control for the examined PV-battery system.

Bidirectional Long Short-Term Memory-Driven Control for Grid-Connected Photovoltaic-Battery Energy Trading Systems: Mixed-Integer Linear Programming Optimization and Online Deep Reinforcement Learning

Key Points

Abstract

Cite This Study