Key points are not available for this paper at this time.
Particle filters (PFs) have been widely used in speaker tracking due to their capability in modeling a non-linear process or a non-Gaussian environment. However, particle filters are limited by several issues. For example, pre-defined handcrafted measurements are often used which can limit the model performance. In addition, the transition and update models are often preset which make PF less flexible to be adapted to different scenarios. To address these issues, we propose an end-to-end differentiable particle filter framework by employing the multi-head attention to model the long-range dependencies. The proposed model employs the self-attention as the learned transition model and the cross-attention as the learned update model. To our knowledge, this is the first proposal of combining particle filter and transformer for speaker tracking, where the measurement extraction, transition and update steps are integrated into an end-to-end architecture. Experimental results show that the proposed model achieves superior performance over the recurrent baseline models.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jinzheng Zhao
Sun Yat-sen University
Yong Xu
Northwestern Polytechnical University
Xinyuan Qian
University of Science and Technology Beijing
IEEE Open Journal of Signal Processing
University of Surrey
University of Science and Technology Beijing
Bellevue Hospital Center
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhao et al. (Mon,) studied this question.
synapsesocial.com/papers/6a216642cdf8429e7e5fa17f — DOI: https://doi.org/10.1109/ojsp.2024.3363649