What question did this study set out to answer?

The research aims to create an automated, FAIR-compliant metadata system to improve climate data handling.

April 23, 2026Open Access

Towards a FAIR-compliant Harmonised AI-based Automatic Metadata for Climate Research

Key Points

The research aims to create an automated, FAIR-compliant metadata system to improve climate data handling.
Developing a work plan with interdisciplinary collaboration between climate and computer science experts.
Leveraging natural language processing and knowledge graphs to enhance metadata quality.
Creating a machine-actionable metadata set that facilitates integration of scientific data.
Identification of common gaps in metadata compliance with FAIR principles.
Development of tools for automatic metadata generation and alignment.
Provision of a sustainable, openly accessible metadata set for the research community.

Abstract

ABSTRACT. In the petabyte era, climate research deals with large and extremely large datasets on a daily basis. Filling in metadata accompanying climate datasets is challenging in many cases. It can be time consuming, often leads to incomplete results and is very error prone. Arguably, most researchers fill only the minimal set of metadata required to publish their data (i.e. software, publication), mostly out of time constraints. The metadata fields are also not filled consistently. For the institution for example sometimes an abbreviation, while the other times the full name is used. There are multiple lower/upper case issues. Moreover, users do not always choose the same names for the same variables they are describing. In multiple cases there are FAIR compliance gaps (findable, accessible, interoperable, reusable). In this talk, we present the idea of an automatic AI-based FAIR-compliant metadata for climate research in order to deal with the aforementioned challenges. Based on an interdisciplinary collaboration within the Leibniz Science Campus “Digital Transformation of Research” (DiTraRe), we created a work plan connecting researchers from the climate domain as well as computer science experts and infrastructure providers (RADAR). Within this framework, we aim to develop a scalable infrastructure that leverages natural language processing (NLP), knowledge graphs, and large language models (LLMs) to support the harmonisation and semantic alignment of metadata in climate research repositories. Our output will be a curated, machine-actionable metadata set that can support both the integration of scientific data and downstream AI research. We aim to deliver not only technical tools but also sustainable resources for the community, including an openly accessible metadata set and methods for its continuous extension and reuse.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper