What type of study is this?

This is a Quantitative Study study.

October 8, 2025Open Access

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

Key Points

Compressed large language models can maintain workflow generation with minimal accuracy drops.
4-bit quantization led to a 10%-15% accuracy loss in real-world applications while preserving tool use efficiency.
The Agent Compression Benchmark offers 12 tasks across 4 capabilities for assessing compression effects on LLMs.
ERank helps systematically analyze compression trade-offs in agentic scenarios for improved deployment.

Abstract

Post-training compression reduces the computational and memory costs of large language models (LLMs), enabling resource-efficient deployment. However, existing compression benchmarks only focus on language modeling (e.g., perplexity) and natural language understanding tasks (e.g., GLUE accuracy), ignoring the agentic capabilities - workflow, tool use/function call, long-context understanding and real-world application. We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs' agentic abilities. ACBench spans (1) 12 tasks across 4 capabilities (e.g., WorfBench for workflow generation, Needle-in-Haystack for long-context retrieval), (2) quantization (GPTQ, AWQ) and pruning (Wanda, SparseGPT), and (3) 15 models, including small (Gemma-2B), standard (Qwen2.5 7B-32B), and distilled reasoning LLMs (DeepSeek-R1-Distill). Our experiments reveal compression tradeoffs: 4-bit quantization preserves workflow generation and tool use (1%-3% drop) but degrades real-world application accuracy by 10%-15%. We introduce ERank, Top-k Ranking Correlation and Energy to systematize analysis. ACBench provides actionable insights for optimizing LLM compression in agentic scenarios. The code can be found in https://github.com/pprp/ACBench.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Dong et al. (Sun,) studied this question.

synapsesocial.com/papers/68e6bc5f38ca8e474d549d00 — DOI: https://doi.org/10.48550/arxiv.2505.19433

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Compression Represents Intelligence Linearly· 2024 · 2 citations
Ranking LLMs by compression· 2024
ACON: Optimizing Context Compression for Long-horizon LLM Agents· 2025
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression· 2024
Designing Large Foundation Models for Efficient Training and Inference: A Survey· 2024 · 5 citations

Authors

Peijie Dong

National University of Defense Technology

Zhenheng Tang

Hong Kong University of Science and Technology

Xiang Liu

Zhejiang DongFang Vocational and Technical College

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider