Accurate early-stage assessment of building energy and carbon performance is essential for informed sustainable design yet remains challenging due to limited design detail and simulation effort. This study presents a Building Information Modeling–Machine Learning (BIM-ML) framework for predicting office building energy and carbon performance at early design stages using simulation-based datasets. A reduced-factorial Design of Experiments (DOE) generated 210 parametric office building models for Orlando, Florida (ASHRAE Climate Zone 2A), complemented by additional climate scenarios. Systematic variations in geometry, envelope, building systems, and operational schedules produced a dataset with 14 independent variables and five performance indicators: Energy Use Intensity, Operational Energy, Operational Carbon, Embodied Carbon, and Total Carbon. Four regression methods—Linear Regression, Model Tree (M5P), Sequential Minimal Optimization Regression, and Random Forest—were trained and evaluated using 10-fold cross-validation. Random Forest showed the strongest overall predictive performance. Feature-importance analysis identified HVAC system type, Window-to-Wall Ratio, and operational schedule as the most influential parameters, while geometric factors had lower impact. Cross-climate analysis and validation with measured data from two university office buildings indicate that the framework is adaptable and generalizable, supporting reliable early-stage evaluation of energy and carbon performance.
Paula et al. (Thu,) studied this question.