What type of study is this?

This is a Quantitative Study study.

September 18, 2025Open Access

Towards HydroLLM: approaches for building a domain-specific language model for hydrology

Key Points

HydroLLM aims to create a domain-specific language model for effective hydrological analysis and decision support.
The study constructed a dataset of approximately 8,800 hydrology-focused question–answer pairs for model training.
Fine-tuning using different LLMs showed that model capacity must align with dataset size for optimal performance.
The research highlighted the balance between model size and performance, underscoring the importance of architecture in domain adaptation.

Abstract

ABSTRACT As large language models (LLMs) continue to expand, their effective adaptation to specialized fields remains a critical challenge. This work presents an initial step toward the development of HydroLLM, a domain-specific LLM for hydrology. We construct a dataset of approximately 8,800 hydrology-focused question–answer pairs, each with a supporting context passage drawn from textbooks and scientific articles. The dataset includes four instructional formats: multiple-choice, true/false, fill-in-the-blank, and open-ended. Using this corpus, we fine-tune several LLMs of varying type and scale – from compact (1.5B) to large (32B) parameter counts using parameter-efficient LoRA (low-rank adaptation) methods. Our methodology compares different fine-tuned models and evaluates performance using accuracy and cosine similarity metrics across task types. Results show that the 8B-DeepSeek-Llama variant achieved the strongest overall performance, while the 32B model overfitted and the 1.5B model underperformed – demonstrating that larger size is not always advantageous and highlighting the need to match model capacity to dataset size. This work demonstrates that effective domain adaptation requires careful consideration of architecture, parameter count, and task complexity. By establishing performance and identifying the limits of current fine-tuning approaches, we took a concrete step toward building HydroLLM as a robust, domain-specific language model for hydrological analysis and decision support.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Dilara Kizilkaya

University of Iowa

Yusuf Sermet

Tulane University

İbrahim Demir

Tulane University

Journals

Journal of Hydroinformatics

Actions

Institutions

University of Iowa

Tulane University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Towards HydroLLM: approaches for building a domain-specific language model for hydrology

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study