Geological reports contain abundant domain-specific knowledge and unstructured textual data, presenting challenges in extracting meaningful information for engineering decision-making. Recent advancements in large language models (LLMs) offer promising solutions. This study benchmarks eight state-of-the-art LLMs on two key tasks—knowledge graph (KG) construction and question answering (QA)—which are crucial for extracting and structuring information from extensive unstructured geological text, thereby supporting risk assessment. We conduct a thorough evaluation of both proprietary and open-source models, utilizing advanced prompt engineering techniques such as in-context learning (ICL), chain-of-thought (CoT), and the proposed knowledge-injected (KI) strategies. The results indicate that, in the zero-shot setting, DeepSeek-V3 excels in KG construction, while DeepSeek-R1 outperforms other models in QA tasks. Prompt engineering exhibited varying impacts: ICL enhanced the overall performance of KG tasks and the precision score of QA-factoid tasks; KI improved the exact match in KG but did not significantly affect the matching score based on semantic similarity, and CoT boosted QA precision through step-by-step reasoning. Human evaluation confirms high factual consistency in models like GPT-4, while others, such as GPT-3.5, exhibit limitations. To enhance practical applicability, we have developed an open-source, interactive platform that integrates all benchmarked LLMs and prompt strategies, facilitating real-time analysis of unstructured geological texts for researchers. Despite these advancements, challenges such as hallucinations and domain-specific comprehension remain. Our findings emphasize the potential of LLMs in geological text analysis while also highlighting the need for further refinement to ensure their reliability in geological risk management applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Qi Ge
Pengfa Li
Jin Li
Journal of Rock Mechanics and Geotechnical Engineering
Zhejiang University
Nanjing University of Information Science and Technology
Nanjing Forestry University
Building similarity graph...
Analyzing shared references across papers
Loading...
Ge et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69af947370916d39fea4b735 — DOI: https://doi.org/10.1016/j.jrmge.2025.12.038