Strategic bidding for wind–battery hybrid systems is increasingly critical as electricity spot markets transition toward market-oriented mechanisms, particularly in Chinese pilot regions. However, dual uncertainties—wind generation variability and volatile locational marginal prices (LMPs) —expose market participants to significant financial tail risk. This study develops a risk-constrained reinforcement learning framework for optimal bidding of wind–storage hybrid systems. We employ soft actor–critic (SAC) for continuous action control and integrate conditional value-at-risk (CVaR) into reward design to explicitly penalize low-probability, high-loss outcomes. The framework incorporates realistic operational constraints, including linearized battery degradation costs and a market-compatible single-bid abstraction for hourly settlement. Using one-year historical operational data from a 150 MW wind farm (with a 91-day test period), we find that storage integration increases annual profit by 108. 4–114. 2% relative to wind-only operation. Critically, the SAC–CVaR policy (η = 0. 35) preserves 97. 3% of risk-neutral profit (7. 71 M vs. 7. 93 M) while substantially mitigating downside risk: CVaR@95% improves by 42. 4% (−549 vs. −952) and VaR@95% improves by 30. 1% (−275 vs. −393). The trained policy achieves sub-millisecond inference (0. 262 ms per decision, ~3820 decisions/s), corresponding to a 3. 8 × 104–5. 7 × 104× speedup over optimization-based solvers (10–15 s per decision), enabling real-time deployment. Behavioral analysis reveals that the agent learns adaptive, forecast-normalized bidding strategies with more conservative reporting in high-price regimes and counter-cyclical battery dispatch patterns, demonstrating effective coordination between profitability and risk control under volatile market conditions.
Ma et al. (Fri,) studied this question.