Key points are not available for this paper at this time.
Tracking and following objects of interest is critical to several robotics use cases, ranging from industrial automation to logistics and warehousing, to healthcare and security. In this paper, we present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed follow anything ( FAn ), is an open-vocabulary and multimodal model — it is not restricted to concepts seen at training time and can be applied to novel classes at inference time using text, images, or click queries. Leveraging rich visual descriptors from large-scale pre-trained models ( foundation models ), FAn can detect and segment objects by matching multimodal queries (text, images, clicks) against an input image sequence. These detected and segmented objects are tracked across image frames, all while accounting for occlusion and object re-emergence. We demonstrate FAn on a real-world robotic system (a micro aerial vehicle), and report its ability to seamlessly follow the objects of interest in a real-time control loop. FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second. To enable rapid adoption, deployment, and extensibility, we opensource our code on our project webpage. We also encourage the reader to watch our 5-minute explainer video.
Building similarity graph...
Analyzing shared references across papers
Loading...
Alaa Maalouf
Citigroup
Ninad Jadhav
Citigroup
Krishna Murthy Jatavallabhula
Moscow Institute of Thermal Technology
IEEE Robotics and Automation Letters
Harvard University
Massachusetts Institute of Technology
Citigroup
Building similarity graph...
Analyzing shared references across papers
Loading...
Maalouf et al. (Wed,) studied this question.
synapsesocial.com/papers/68e792c7b6db643587703cbb — DOI: https://doi.org/10.1109/lra.2024.3366013
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: