What type of study is this?

September 5, 2025Open Access

Talk data to me! Evaluating the potential for large language models to enhance data discoverability across UKRI’s federated data services

Key Points

Low trust of large language models by academic researchers limits their use in data discovery.
Focus group feedback indicated the need for training on large language model resources to enhance usability.
A semantic search tool integrating various data catalogues effectively enhances data discoverability for researchers.
Concerns about error reliability in large language model outputs highlight the importance of transparency in data search tools.

Abstract

ObjectivesThe talk will: (i) Describe the development of a large language model (LLM) powered semantic search tool for UKRI data catalogues, and (ii) Examine the concerns and opportunities of using this tool among researchers for data discovery. MethodsA semantic search tool was developed integrating the data catalogues of Administrative Data Research UK, Consumer Data Research Centre, and UK Data Service. We used OpenAI’s vector embedding service to convert these metadata into embeddings, allowing natural language search to be used rather than keywords only. We assessed the acceptability and suitability of this tool using four focus groups. Participants were recruited across academic researchers, PhD researchers, data services staff, and local government / third sector analysts (n=36). Data collected from focus groups were analysed using thematic analysis. ResultsThe key themes identified in focus groups were: (i) Current data discovery techniques are dependent on keyword strategies for searching (including the dominance of using Google). There is need to support training for using any LLM based resources. (ii) There was low trust of LLMs, especially in academic researchers. Participants were concerned that results may be erroneous. Being able to ‘explain’ why a search result was returned was viewed as valuable. (iii) Having a resource that collates all metadata in one place was powerful for helping researchers find data. This could be improved through leveraging the power of LLMs to summarise large quantities of information about datasets to make data discovery more efficient. Our talk will detail steps towards addressing these challenges. ConclusionAlthough Large Language Model’s can be useful for supporting federated data discovery among researchers, tools need to be developed that are responsible, trustworthy and open if researchers are going to use them.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper

Cite This Study

Green et al. (Thu,) studied this question.

synapsesocial.com/papers/68bb3d622b87ece8dc956756 https://doi.org/https://doi.org/10.23889/ijpds.v10i4.3135

KI fragen

Bookmark

View Full Paper