Recently, smart speaker systems that combine microphones and loudspeakers have been gaining popularity, and their application to sound field reproduction has attracted increasing attention. Unlike traditional loudspeaker systems, smart speaker units are often placed freely, which makes it difficult to know their spatial configuration in advance. However, for object-based audio rendering, it is essential to estimate the positions of these units to determine an appropriate rendering strategy. In previous work, a data-driven method was proposed to estimate the angular directions of the units based on voice directivity. In this study, we propose a data-driven approach to estimate the 3-D positions of smart speaker units using simultaneous measurement. In the proposed method, all loudspeakers emit measurement signals at the same time, and the received signals are used as input to a deep learning model that estimates the positions of individual loudspeakers. Simulation experiments are conducted to compare the estimation performance of neural network architectures and to demonstrate the effectiveness of the proposed approach. Work partially supported by Research Institute for Science and Technology of Tokyo Denki University Grant No. Q24J-04/ Japan.
Nishihara et al. (Wed,) studied this question.