Although Convolutional Neural Networks (CNNs) have achieved remarkable accuracy in intelligent tasks, their increasing complexity hinders low-latency execution. While edge computing mitigates the wide-area network delays typical of cloud-based inference, it remains constrained by limited computational resources when processing complex models under high concurrency. Collaborative inference has emerged as a promising paradigm to address these limitations; however, existing approaches often struggle with rigid routing, limited scalability, and inefficient resource utilization. In this paper, we propose a novel collaborative inference acceleration mechanism that integrates In-Network Computing (INC) within an Information-Centric Networking (ICN) framework. By leveraging the name-based resolution capability of ICN, our approach dynamically harnesses underutilized computational resources across distributed INC nodes, enabling flexible layer-wise offloading that transcends the limitations of static IP paths. Furthermore, a distributed decision-making and node-selection algorithm is designed to orchestrate CNN layer assignment based on real-time network conditions and node workloads. Extensive simulations on representative models demonstrate the effectiveness of the proposed method. Specifically, for the computationally intensive VGG16 model under high concurrency, the average task completion time is reduced by 43.3% and 60.2% relative to IP-based and Edge-Cloud baselines, respectively, with a load balancing fairness index maintained above 0.86.
Hu et al. (Wed,) studied this question.