ABSTRACT Human activity recognition (HAR) with the help of wearable sensors has become a major research focus because of its broad application areas, such as healthcare monitoring, smart homes and human computer interaction. Yet, it is not easy to recognise activities accurately by using multivariate sensor data because the sensors can produce noisy signals, there can be redundant features and complex temporal dependencies make the task difficult. In our paper, we suggest a deep learning method that combines sensor‐to‐image conversion, feature‐level fusion, dimensionality reduction and multi‐scale classification to solve the above issues. Firstly, raw multivariate sensor signals are transformed into structured image representations with the use of spectrogram‐based encoding, thus enabling convolutional neural networks to grasp spatial patterns in temporal data quite well. Two deep architectures that complement each other, namely Inception and Xception, are used to obtain significant features from the generated images. Next, as a way of feature‐level fusion, the feature vectors extracted from the two networks are joined to harness the complementary information contained in both networks. After that, principal component analysis (PCA) is used to get a small reduced fused feature (RFF) representation in order to minimise feature redundancy and computational complexity. This reduced feature space is later processed through a common multi‐scale convolutional front‐end with kernel sizes of 3, 5 and 7, and then CNN, LSTM and RNN classifiers are used to represent spatial temporal activity patterns. As shown by the tests on the WISDM, UCI‐HAR and PAMAP2 datasets, the proposed method can achieve excellent results with accuracies of 98.88%, 98.71% and 98.71%, respectively.
Gautam et al. (Thu,) studied this question.