What type of study is this?

This is a Experimental Study study.

October 3, 2025Open Access

Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models

Key Points

The proposed approach significantly lowers computational costs while maintaining accuracy.
Experiments on real-world document datasets show substantial improvements in efficiency.
Token pruning effectively filters non-informative background regions from document images.
A binary patch-level classifier and max-pooling refinement enhance spatial coherence in text.

Abstract

Recent progress in vision-language models (VLMs) has led to impressive results in document understanding tasks, but their high computational demands remain a challenge. To mitigate the compute burdens, we propose a lightweight token pruning framework that filters out non-informative background regions from document images prior to VLM processing. A binary patch-level classifier removes non-text areas, and a max-pooling refinement step recovers fragmented text regions to enhance spatial coherence. Experiments on real-world document datasets demonstrate that our approach substantially lowers computational costs, while maintaining comparable accuracy.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper