What type of study is this?

This is a Experimental Study study.

October 27, 2025Open Access

Evaluation Framework for Fault Diagnosis Using Technical Manuals in Retrieval-Augmented Large Language Models

Key Points

Retrieval-based methods outperform LLM-based ones in fault diagnosis tasks, highlighting that system design is crucial.
Results indicate alignment with task design is necessary for effective label-matching and chat-style interactions.
The framework offers industrial insights, emphasizing the need for evaluation-first validation and realistic piloting of systems.
Adopting retrieval-augmented techniques can enhance technical language processing applications beyond traditional LLMs.

Abstract

Fault diagnosis is a time-intensive maintenance task often reliant on the expertise of senior technicians. As this workforce ages and demand grows for digital tools, there is a growing need to capture and automate this knowledge while maintaining the precision required for technical applications. This study introduces an evaluation-driven framework for fault code recommendation, applied to a ground vehicle diagnosis system. Two tasks were designed to reflect potential system configurations: (1) a chat-style task simulating large language model (LLM) interaction, and (2) a label-constrained task using structured fault codes from technical manuals. Multiple retrieval-augmented generation (RAG) configurations were compared against LLM-only and retrieval-only baselines. Results showed that retrieval-based methods outperformed LLM-based ones for label-matching tasks, while the chat task showed challenges in linking observations to fault codes from the manual. These results highlight the importance of aligning task design with evaluation goals and considering retrieval-first approaches as viable alternatives to LLMs in technical language processing (TLP) applications. Beyond experimental findings, we outline industrial lessons learned: the importance of aligning system design to use case goals, adopting evaluation-first validation, and the need to pilot LLM-based systems under realistic conditions. These lessons provide practical guidance for developing effective diagnostic support systems in industrial contexts.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper