Los puntos clave no están disponibles para este artículo en este momento.
Recent research studies have made significant progress in acoustic-based gesture recognition. However, existing methods lack the capability to expand to customized gestures and adapt to different practical environments. We propose a highly scalable gesture recognition system called EchoGest which integrates a well-designed feature-wise transformation layer into prototypical network framework, and accomplishes unseen gesture recognition with a device's built-in speaker and microphone. Our key insight involves gauging the similarity between query sample representations and class prototypes in the embedding space, and thus enabling the scalability to unseen gestures. Meanwhile, we introduce a feature transformation layer to linearly adjust feature maps and propose an efficient two-stage training strategy to obtain regularized parameters for this layer. Specifically, this layer employs affine transformation to enhance intermediate feature activations and yield more diverse feature distributions for cross-domain recognition, and it improves recognition accuracy by 10% in 1-shot cases. We train the system with a collected a letter gestures (i.e., writing 'A' to 'Z') dataset and test it on a digit gestures (i.e., writing '0' to '9') dataset with 10 volunteers. The results show that EchoGest can recognize unseen digit gestures with an accuracy of 93.7% in 2-shot cases, and 93.2% in the leave-one-user-out testing setting. We also explore a semi-supervised clustering approach in which each user's data can be used to update his or her prototypes for personalized customization. The comprehensive experiments also verify that EchoGest remain good performance across various environments, age groups, and different devices.
Wang et al. (Thu,) studied this question.