Abstract Residential and building-level electricity forecasting is increasingly based on smart-meter streams, but direct data pooling is often difficult because load traces are private, geographically dispersed and statistically different across households and buildings. This paper studies the effect of such client heterogeneity in federated energy forecasting. A multi-source benchmark is constructed by combining processed Smart, CU-BEMS, UCI Household and AMPds clients, giving 22 usable clients and 1, 522, 510 observations at 15-minute resolution. The benchmark compares non-federated baselines (Persistence, Ridge, LocalOnly and Centralised training), standard federated baselines (FedAvg, FedProx and FedPer) and the proposed HAPFL framework over 1-step, 12-step and 24-step forecasting horizons. HAPFL separates a shared temporal encoder from client-specific prediction heads and combines proximal stabilisation, latent prototype alignment and difficulty-aware aggregation. In the reported benchmark protocol, HAPFL gives the lowest mean MAE and the highest mean R² at all three horizons. At horizon 1, mean MAE improves from 0. 208 to 0. 196 and mean R² improves from 0. 741 to 0. 769 relative to centralised training, while the worst-10% client MAE is reduced from 0. 301 to 0. 274. At horizons 12 and 24, the corresponding MAE reductions over centralised training are 6. 11% and 7. 60%, respectively. The results indicate that vanilla federated averaging is not adequate for strongly non-identically distributed energy clients, while personalised and heterogeneity-aware federated learning is a more suitable direction for decentralised household and building load forecasting.
Das et al. (Mon,) studied this question.