This study develops a domain-adaptive multimodal RAG (Retrieval-Augmented Generation) system to improve the accuracy and efficiency of technical question answering based on large-scale structured manuals. Using Hyundai Staria maintenance documents as a case study, we extracted text and images from PDF manuals and constructed QA, RAG, and Multi-Turn datasets to reflect realistic troubleshooting scenarios. To overcome limitations of baseline RAG models, we proposed an enhanced architecture that incorporates sentence-level similarity annotations and parameter-efficient fine-tuning via LoRA (Low-Rank Adaptation) using the bLLossom-8B language model and BAAI-bge-m3 embedding model. Experimental results show that the proposed system achieved improvements of 3.0%p in BERTScore, 3.0%p in cosine similarity, and 18.0%p in ROUGE-L compared to existing RAG systems, with notable gains in image-guided response accuracy. A qualitative evaluation by 20 domain experts yielded an average satisfaction score of 4.4 out of 5. This study presents a practical and extensible AI framework for multimodal document understanding, with broad applicability across automotive, industrial, and defense-related technical documentation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yerin Nam
Hyeung‐Sik Choi
Jonggeun Choi
Applied Sciences
Seoul National University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Nam et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68c1aad354b1d3bfb60e3a73 — DOI: https://doi.org/10.3390/app15158387
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: