Human activity recognition (HAR) using wearable sensors plays a critical role in applications ranging from mobile health to smart environments. A key challenge in this field lies in generalizing across heterogeneous sensor configurations and diverse activity taxonomies, especially in scenarios where labeled data are scarce or deployment conditions vary. We propose Wonderwall, a virtual-to-real foundation model that achieves robust cross-scenario generalization for wearable HAR through a novel dual-stage pretraining paradigm. The first stage establishes biomechanical priors via physics-based simulated motion data, while the second stage refines these representations using curated real-world datasets through joint training and semantic alignment. A systematic data curation framework addresses dataset fragmentation by unifying activity labels, balancing class distributions, filtering low-quality samples, and deduplication. Moreover, a dynamic graph-based encoder enables adaptive modeling of sensor relationships across diverse configurations. Evaluated across ten heterogeneous HAR benchmarks, Wonderwall demonstrates state-of-the-art generalization in zero-shot, few-shot, and full-shot settings. An ablation study assesses the contribution of each core component, while a comparison of fine-tuning and probing strategies highlights the model's adaptability across various data regimes. Furthermore, a sensing imaging experiment validates the alignment of Wonderwall's embeddings with semantic representations. These findings underscore the effectiveness of the virtual-to-real pretraining paradigm in achieving robust cross-scenario generalization.
Miao et al. (Mon,) studied this question.