What question did this study set out to answer?

This study aims to design and evaluate an NLP system for automating document workflows in R&D organizations.

May 9, 2026Open Access

NLP System for Automation of Document Workflow in a Research and Development Organization—A Case Study

Key Points

This study aims to design and evaluate an NLP system for automating document workflows in R&D organizations.
Developed an on-premise NLP system using LangGraph architecture and local large language models.
Evaluated the system across twelve LLMs under two testing conditions: known document types and unclassified data.
Assessed model accuracy and computational efficiency using a defined dataset and real-world scenarios.
The cogito:70b model achieved 97.3% accuracy with known data and 94.3% with unclassified data.
The magistral:24b model provided high accuracy while being more computationally efficient than the 70B model.
Qwen3:32b showed superior performance in handling out-of-spec inputs.

Abstract

Research and development (R&D) organizations face significant operational bottlenecks due to the manual processing of diverse, unstructured documents. This paper presents the design, implementation, and pilot evaluation of an on-premise, multi-agent natural language processing (NLP) system developed for the GIG National Research Institute (GIG-NRI). Built upon a LangGraph architecture, the system utilizes open-weight large language models (LLMs) to perform zero-shot document classification, dynamic routing, and specialized information extraction. We rigorously evaluated the classification agent across twelve different local LLMs under two distinct testing regimes: first, using a strictly defined dataset of known administrative and scientific document types, and second, introducing a subset of out-of-distribution (unclassified) data to test real-world robustness. Our results demonstrate that the 70-billion parameter model (cogito:70b) achieved a peak accuracy of 97.3% in the first regime and maintained a strong 94.3% accuracy when confronted with out-of-spec data. However, our analysis reveals a critical operational trade-off regarding computational efficiency. The 24-billion parameter (magistral:24b) and 32-billion parameter (qwen3:32b) models emerged as the next best in overall accuracy while requiring less than half the processing time of their 70B counterpart. Notably, magistral:24b proved superior for strictly defined document streams, whereas qwen3:32b demonstrated greater robustness when handling out-of-distribution inputs. Furthermore, we demonstrate the efficacy of heterogeneous model assignments for complex multi-stage tasks, such as Scientific Article summarization via hierarchical Map-Reduce.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper