Key points are not available for this paper at this time.
In this paper, we propose to enhance action recognition accuracy by leveraging synthetic data and domain adaptation. Specifically, We achieve this through the creation of a synthetic dataset mimicking the Multi-View Extended Video with Activities (MEVA) dataset and the introduction of a multi-modal model for domain adaptation. This synthetic-to-real adaptation approach improves recognition accuracy by leveraging the synthetic data to enhance model generalization. Firstly, we focus on creating and utilizing synthetic datasets generated through a high-fidelity physically-based rendering system. The sensor simulation incorporates domain randomization and photo-realistic rendering to reduce the domain gap between the synthetic and real data, effectively addressing the persistent challenges of real data scarcity in action recognition. Complementing the synthetic dataset generation, we leverage the multi-modal models in the synthetic-to-real adaptation experiments that utilize RGB images and skeleton features. Our experiments show that even relatively straightforward techniques, such as synthetic data pre-training, provide improvements to the models. Our work highlights the effectiveness of the approach and its practical applications across various domains, including surveillance systems, threat identification, and disaster response.
Lu et al. (Fri,) studied this question.
Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context: