What type of study is this?

This is a Experimental Study study.

March 15, 2026Open Access

DocCLSNMMH: A Benchmark for Native Multi-Modal Hybrid Document Classification in Enterprise Data Security Governance

Puntos clave

To address the lack of benchmarks for native multi-modal hybrid document classification in enterprise data security.
Introduced the benchmark dataset DocCLS_NMMH including an out-of-distribution test subset.
Assessed accuracy degradation in heterogeneous documents and few-shot scenarios.
Evaluated current state-of-the-art models like LayoutLM and training-free models on the dataset.
LayoutLM achieved state-of-the-art performance with 98.66% accuracy on DocCLS_NMMH.
Approximately 7% accuracy degradation noted on the OOD test subset.
Training-free models consistently exceeded 95% accuracy across the full dataset.

Resumen

In the practice of enterprise data security governance, document AI has emerged as a mission-critical component that seeks to underpin the prevention of document leakage via automatic accurate classification and identification of sensitive content. Arising from this, a need to bring document classification benchmark closer to real-world engineering applications is highlighted. This paper identifies the lack of public datasets for native multi-modal hybrid document classification and, accordingly, proposes the dataset DocCLSNMMH (Native Multi-Modal Hybrid Document Classification) along with its out-of-distribution (OOD) test subset. An experimental study on the proposed dataset demonstrates that current benchmarks have become irrelevant and need to be updated to evaluate native multi-modal hybrid documents. Meanwhile, accuracy degradation in heterogeneous documents and few-shot scenarios is assessed, as all of these are prevalent in the practice. The experimental results demonstrate that LayoutLM achieves a state-of-the-art (SOTA) performance with 98. 66% accuracy on DocCLSNMMH, with only approximately 7% accuracy degradation on its OOD test subset, while training-free models (Qwen2. 5-VL-32B and Gemma3-27B) consistently achieve over 95% accuracy across the full dataset. The SOTA performance of these models on our benchmark provides an effective guidance for model selection in real engineering applications.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo