Key points are not available for this paper at this time.
Robust and adaptive operation of a video conferencing system can be realized by proper utilization of a network of video cameras and microphone arrays. Such multimodal sensory modules can provide valuable information about the geometrical and acoustical properties of an environment as well as can allow for real-time monitoring for dynamic activity in the environment. In this paper, we present an overview of the research activities focused on utilization of omnidirectional camera networks for geometrical and environmental modeling and microphone arrays for speaker localization.
Trivedi et al. (Mon,) studied this question.