What question did this study set out to answer?

The study aims to enhance the accuracy of RAG systems in agriculture by integrating textual and visual data for rose cultivation.

June 6, 2026Open Access

Multimodal RAG System for Rose Cultivation Using Paragraph-level Chunking and VLM-based Image Captioning

Read Full Paperexternally

Key Points

The study aims to enhance the accuracy of RAG systems in agriculture by integrating textual and visual data for rose cultivation.
Developed a multimodal RAG system combining paragraph-level chunking and VLM-based image captioning.
Isolated text and images for extraction and built a vector database using context-integrated captions generated by GPT-4o.
Performance evaluated by comparing Recall@5 and Answer Relevancy metrics against traditional text-based RAG systems.
The proposed system improved Recall@5 by 28.8% compared to traditional methods.
Achieved a 33.3% increase in Answer Relevancy over conventional text-based RAG systems.

Abstract

최근 PDF 문서 기반의 지식 정보를 활용하기 위한 RAG 연구가 활발히 진행되고 있다. 그러나 장미 재배와 같은 농업 분야의 설명서는 병충해 진단이나 생육 상태 판단에 있어 텍스트뿐만 아니라 시각적 정보의 활용이 필수이다. 기존 텍스트 중심 RAG 시스템은 PDF 내 이미지 정보를 처리하지 못하고 고정 길이 청킹 방식으로 인한 문맥 단절로 검색의 정확도가 저하되는 한계가 있다. 이에 본 연구에서는 문단 단위의 청킹과 VLM 기반 이미지 캡셔닝 기술을 결합한 멀티모달 RAG 시스템을 제안한다. 제안 시스템에서는 텍스트와 이미지를 분리 추출하고, GPT-4o를 통해 생성된 이미지 캡션을 문맥 정보와 통합하여 벡터 데이터베이스를 구축하였다. 실험 결과 제안 시스템은 기존 텍스트 기반 RAG 시스템 대비 Recall@5 28.8%, Answer Relevancy 33.3% 향상과 같은 성능 개선을 보였다.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ji-Wan Han

Chang-Pyo Yoon

Korea Advanced Institute of Science and Technology

Chi-Gon Hwang

Journals

The Journal of the Korean Institute of Information and Communication Engineering

Multimodal RAG System for Rose Cultivation Using Paragraph-level Chunking and VLM-based Image Captioning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study