What question did this study set out to answer?

The research aims to enhance the performance of AI systems in retrieving information from cultural heritage knowledge graphs using natural language.

April 19, 2026Open Access

Natural Language as an Interface for Structured Knowledge

Key Points

The research aims to enhance the performance of AI systems in retrieving information from cultural heritage knowledge graphs using natural language.
Investigated NL-to-SPARQL translation for knowledge graphs
Developed a hybrid neuro-symbolic architecture to reduce AI hallucinations
Performed graph exploration and symbolic validation on cultural heritage data
Created benchmark datasets with natural language questions and SPARQL queries for evaluation
Demonstrated improved robustness of querying cultural heritage graphs
Validated iterative refinement processes for error correction
Showed potential for enhancing interpretability and reusability in Digital Humanities

Abstract

This PhD project investigates how answers given by AI systems can become grounded in expert-curated facts with explicit worldviews and provenance, securing transparent deductive reasoning from knowledge graphs together with the flexibility of natural language; but also the other way around: how access to cultural heritage knowledge graphs can be made easier and more intuitive across domains with the use of Large Language Models. In particular, the research starts from a recurring problem in Knowledge Graph Question Answering: LLMs' performance becomes fragile when applied to big and complex cultural heritage graphs. In this setting, event-centric structures, long multi-hop paths, heterogeneous modelling practices, and the frequent divergence between ontology and instantiated data make query generation particularly error-prone. The project asks whether a distillation (summarisation) of a knowledge graph can improve NL-to-SPARQL translation according to approaches that rely only on ontology-level descriptions or heavy full-graph injections. More specifically, it explores two research questions: first, whether a hybrid neuro-symbolic architecture can reduce hallucinations; and second, how such an approach can be evaluated in a reproducible way on cultural heritage knowledge graphs. The project combines graph exploration, symbolic validation, question interpretation, and LLM-based production. A distillation process extracts the "data model" (class-property patterns, multi-hop structures, usage frequencies) from a SPARQL endpoint. These patterns are encoded and injected into prompt-based NL-to-SPARQL generation. The output is validated in layers for syntactic correctness, lexical validity of referenced entities/properties, and compatibility with the actual data model. Detected errors trigger iterative refinement by feeding explicit constraints back into the generation loop. The research also includes evaluation. The project will create benchmark datasets from cultural heritage graphs, featuring manually curated natural language questions and gold SPARQL queries, to compare zero-shot prompting, ontology-guided, and data-pattern-guided generation. The PhD's main contribution is methodological: defining and testing a framework to ground LLM-based querying in the structure of cultural heritage knowledge graphs, ultimately improving robustness, interpretability, and reuse in Digital Humanities and Cultural Heritage research.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Remo Grillo (Fri,) studied this question.

synapsesocial.com/papers/69e473ff010ef96374d8fbae https://doi.org/https://doi.org/10.5281/zenodo.19632674

Bookmark

View Full Paper