ABSTRACT Multi‐view video provides object information from diverse viewpoints, making it increasingly useful in applications such as 3D reconstruction, object tracking, and augmented reality. This paper proposes an efficient framework to address the high computational complexity and large‐scale data management challenges encountered during object segmentation in multi‐view image environments. The framework dynamically switches between lightweight and original models, and uses a video multi‐object segmenter with a low‐rank projection matrix and a lightweight mask refiner. Furthermore, cosine similarity between consecutive frames is used to accurately determine the extent of motion or variation of target objects. This information enables adaptive adjustment of the lightweight level for the current frame, allowing fine‐grained model selection. In a multi‐GPU setting, the coexistence of lightweight and original models can lead to execution time imbalance across GPUs, causing frame‐level latency. To mitigate this, the proposed framework optimally manages inter‐GPU data transfers at each frame by considering the hardware connectivity of GPUs. As a result, the segmentation execution time per frame is balanced across GPUs, minimizing the overall average per‐frame execution time. Experimental results demonstrate that the proposed framework maintains the IoU drop within 2.86 due to lightweight model usage, while achieving a 34.3 reduction in average per‐frame execution time.
Yong et al. (Sun,) studied this question.