A crucial piece of technology for activity tracking and health monitoring is wearable accelerometer-based Human Activity Recognition (HAR). However, there are practical difficulties due to the requirement for huge labeled datasets and the diversity in sensor placement 1,2. In order to acquire meaningful representations from unlabeled accelerometer data, this work presents a self-supervised learning (SSL) technique based on the Transformer architecture 3. The method uses a pretext task for masked reconstruction that is improved by controlled noise injection. The model is refined on three labeled datasets, WISDM, REALWORLD, and OPPORTUNITY, following pre-training on the unlabeled Capture24 dataset. The results show that, when compared to training from scratch, comprehensive fine-tuning of the pre-trained model results in an average improvement of 18.1% in F1-score. Recognition of Human Activity Additionally, the model shows a reasonable capacity to generalize across several sensor locations, particularly when adjusted using a small quantity of labeled data from a new location. The suggested model offers dependable performance despite its small size—less than 100,000 parameters and 30 million FLOPs—making it appropriate for implementation on wearable devices with constrained resources. These results imply that Transformer-based SSL can retain strong performance across a range of users and sensor setups while greatly reducing reliance on labeled data.
Choudhary et al. (Wed,) studied this question.