Key points are not available for this paper at this time.
Human Activity Recognition (HAR) is an important task for Health monitoring and Lifelog applications. HAR employs wearable sensors and can be a low-cost and effective solution. With the increasing demand for HAR in recent years, greater attention must be paid to complex and high-level semantic behaviors. To this end, we suggest describing complex human activity in terms of sub-action time combination and sub-action frequency decomposition. We propose a deep-learning framework named "Convolution and Spatial Long-short Term Memory (CSLSTM)" for complex HAR. CSLSTM consists of a downsample module and a spatial distribution encoding module. After processing raw sensor data (accelerometer, gyroscope, magnetometer) into time-frequency spectrograms, the downsample module uses the residual convolution block to downsample and extract local texture features of frequency responses from the spectrogram. The spatial distribution encoding module, an LSTM network in the 2D time-frequency domain, mines the time-frequency distribution of frequency responses. In this framework, LSTM is no longer used to process time series data, but to process 2D time-frequency spectrogram data. We evaluate the performance of CSLSTM on two widely used datasets: ExtraSensory, PAMAP2. Experimental results demonstrate that the F1 score of CSLSTM reaches 79.45%, 99.22% on the ExtraSensory and PAMAP2 dataset, respectively.
Tian et al. (Fri,) studied this question.