What type of study is this?

September 10, 2025

Exploring Large Language Models for Scientific Question Answering via Natural Language to SPARQL Translation

Key Points

The combined use of fine-tuning and prompting improves language model performance significantly.
Results indicate exceptional outcomes on both SciQA and DBLP-QuAD benchmarks, supporting effective optimization strategies.
This analysis identifies common error patterns and opportunities for transfer learning in question answering tasks.
Insights emphasize the growing importance of developing more challenging benchmarks for evaluating model capabilities.

Abstract

Translating scientific questions expressed in natural language into SPARQL queries that can be executed over knowledge graphs remains a significant challenge in the field of question answering. Recently, several prominent benchmarks, notably SciQA and DBLP-QuAD, have emerged to evaluate performance in this domain. In this paper, we provide a comprehensive analysis of the performance of language models on these benchmarks, assessing various optimization strategies. Our results indicate that the combined use of fine-tuning and prompting techniques, especially when incorporating strategic few-shot selection, produces excellent results on both benchmarks. These findings underscore an urgent need for more challenging benchmarks to better assess model capabilities. We identify key insights, common error patterns, and potential opportunities for transfer learning, and we discuss their implications for optimizing the performance of large language models in knowledge graph-based question answering tasks.

Mark Helpful

Bookmark

Relay