Abstract This paper introduces a new method for distributed multi-robot formation control, in which the emphasis is placed on combining audio-visual inter-agent sensing with Bluetooth communication. We recommend a hierarchical control, which would include a local formation controller and centralized control for ESP32-based robots. The scheme is based on a two thread architecture with audio processing and display during video playback without sacrificing synchronous motor output. To dynamically change robot positioning, we’ve developed an adaptive formation system that uses acoustic signatures and visual landmarks obtained in situ. In a time-division multiple access protocol developed for low-latency between robots, communication is carried out by Bluetooth. Experiments have shown that with a line, wedge, and circle, and keeping the shape to within ±15 cm rms error is possible. When control loop frequencies are maintained above 50 Hz, the device achieves audio packet delivery rates of 92% percent. The distributed sensing reduces individual robot energy costs by approximately 34% compared to traditional architectures, measured at 8.2 W versus 12.5 W per robot for the centralized baseline. The system successfully tolerates temporary sensor occlusions of up to 2 seconds through dead-reckoning supported by acoustic observations; extended occlusions are acknowledged as a limitation that accumulates dead-reckoning drift. Field experiments show that the robot swarms of 3 to 12 units are able to be deployed safely, and that the platform can withstand communication disruptions, short-term sensor occlusions, and dynamic challenges.
Gupta et al. (Mon,) studied this question.