This paper presents two forecast-driven energy trading methodologies for a grid-connected photovoltaic-battery system participating in the day-ahead electricity market. Both methodologies use bidirectional long short-term memory neural networks with attention to forecast electricity prices, but they differ in the way the resulting forecasts are converted into operational decisions. The first method uses 24- to 48 h-ahead price forecasts within a mixed-integer linear programming rolling-horizon optimizer to compute the revenue-maximizing schedule for the following day. The second method uses an online twin delayed deep deterministic policy gradient controller that outputs a complete 24 h charge–discharge schedule once per day, using state information that includes battery state, recent price history, forecast prices, and forecast photovoltaic production. The control models are trained using historical data from 2019 to 2022, validated chronologically on 2023 data, and tested on the 2024 annual horizon, while the price forecaster is trained and validated on non-2024 data and evaluated on the held-out 2024 test period. In the realistic execution setting, schedules are planned using forecast photovoltaic production and implemented against actual photovoltaic production, while the day-ahead omniscience benchmark uses actual next-day prices and actual PV production as ideal scheduling inputs. The BiLSTM-MILP framework achieves EUR 10,928.7 over the 2024 test horizon, corresponding to 82.67% of the day-ahead omniscience benchmark. The online BiLSTM-TD3 controller achieves EUR 10,884.9, corresponding to 82.34% of the same benchmark and 99.60% of the BiLSTM-MILP revenue, while outperforming a rule-based baseline by 34.9%. These results show that online deep reinforcement learning can approach the performance of explicit mathematical optimization in day-ahead PV-battery trading while substantially improving over simple rule-based operation. Overall, the results indicate that BiLSTM-based forecasts can support both optimization-based and reinforcement-learning-based day-ahead control for the examined PV-battery system.
Vamvouras et al. (Mon,) studied this question.