What question did this study set out to answer?

The aim is to identify design choices that optimize large language models for domain-specific knowledge extraction.

April 23, 2026Open Access

LLM‐Based Scientific Assistants for Knowledge Extraction: Which Design Choices Matter?

DEDavid ExlerKerntechnische Entsorgung Karlsruhe (Germany)MRManuel RaimannKerntechnische Entsorgung Karlsruhe (Germany)MMMarc MünkerKarlsruhe Institute of Technology

Key Points

The aim is to identify design choices that optimize large language models for domain-specific knowledge extraction.
Introduced the LLM Playground for optimizing LLMs.
Utilized prompt engineering, external knowledge integration, and reasoning strategies.
Set up a chemical chatbot as a case study and compared its performance using ChemBench.
Demonstrated improved accuracy in answering domain-specific questions.
Provided tested architectures ready for deployment in specialized applications.
Highlighted the efficacy of various optimization techniques in comparison.

Abstract

Large Language Model chatbots have gained significant popularity, offering knowledge to support specialists in diverse fields. However, adapting models to specific use cases and specialized domains presents considerable challenges. Hence, we introduce the LLM Playground, a comprehensive approach to optimizing LLMs for specialist applications with respect to their accuracy in answering domain‐specific questions, addressing the limitations of unmodified models. The utilized optimization techniques begin with Prompt Engineering, advance to the integration of external knowledge, and culminate in complex reasoning strategies or self‐feedback loops. This paper introduces various architectures for scientific assistants, comprising individual enhancement techniques, both in isolation and in combination with others, designed to facilitate comparisons. To demonstrate the efficacy of the LLM Playground, a chemical chatbot is set up as a case study, and the optimization techniques are compared using ChemBench, an independent question–answer benchmark for the chemical domain, to measure its performance. By providing tested, ready‐to‐deploy architectures and clear use‐case guidance, this work helps researchers and practitioners leverage LLMs in domain‐specific applications. The insights and methodologies presented in this paper contribute to the growing body of knowledge on tailoring LLMs to meet the unique demands of specialized fields.

Ask AI

Helpful

Bookmark

View Full Paper