What question did this study set out to answer?

The survey aims to explore methods for deploying multimodal large language models at the network edge, focusing on overcoming computational and memory challenges.

June 17, 2026Open Access

A survey on edge multimodal large models: Compression, inference acceleration, and applications

Key Points

The survey aims to explore methods for deploying multimodal large language models at the network edge, focusing on overcoming computational and memory challenges.
Categorizes literature based on model-level compression and system-level inference acceleration.
Reviews existing approaches addressing deployment challenges in heterogeneous edge environments.
Examines practical applications and research directions for edge-deployed multimodal large models.
Highlights the importance of architectural design and parameter reduction for model-level compression.
Discusses runtime optimizations to improve inference speed and resource management.
Identifies emerging research directions such as edge-native model architectures for enhanced efficiency.

Abstract

Deploying multimodal large language models (MLLMs) at the network edge is critical for enabling low-latency, privacy-preserving multimodal intelligence. However, the substantial computational and memory demands of MLLMs present significant challenges for deployment on heterogeneous and resource-constrained edge devices. This survey systematically reviews existing approaches aimed at addressing these challenges. We categorize the literature along two complementary dimensions: model-level compression, which focuses on efficient architectural design and parameter reduction, and system-level inference acceleration, which emphasizes runtime optimizations such as scheduling and resource management. In addition, the survey examines the practical applications of edge-deployed MLLMs in domains such as cyber intelligence and embodied intelligence, and discusses emerging research directions, including edge-native model architectures, to further improve the trade-off between intelligence capability and resource efficiency.

Demander à l'IA

Bookmark

View Full Paper