What question did this study set out to answer?

This work aims to enhance data usability in data spaces by transforming raw and semi-structured data into trustworthy, interoperable formats.

March 13, 2026Open Access

Semantic Data Transformation, FAIRification and Provenance for Data Spaces

Key Points

This work aims to enhance data usability in data spaces by transforming raw and semi-structured data into trustworthy, interoperable formats.
Developed a framework for data preparation based on standardized W3C technologies.
Implemented metadata extraction for data provenance and FAIRification, expressed in RDF triples.
Utilized a change tracking algorithm to detect modifications between dataset versions.
Evaluated the framework with a real-world urban data scenario involving multiple datasets.
Facilitated the transformation of disparate data formats into a semantically enriched RDF representation.
Improved the findability and usability of datasets through comprehensive metadata generation.
Enabled seamless onboarding and participation of new data providers in data spaces.

Abstract

• A Data Spaces framework for data preparation based on standardized W3C technologies. • Metadata extraction for provenance and FAIRification of data, expressed in RDF triples. • Supports the discovery of datasets using semantically enriched queries on the metadata. Despite recent advancements in standardized data exchange and governance, facilitated by the concept of data spaces, data providers still lack essential tools to enable participation in data spaces. This limitation primarily stems from the fact that data is typically collected or generated in arbitrary formats, employing diverse types, flexible, and evolving schemas, as well as different data modalities (text, image, video, time series, etc.). Consequently, semantic interoperability and compliance with the FAIR principles are often overlooked, which compromises the utility of shareable data. In this work, we propose a framework that transforms raw or semi-structured data into trustworthy data made available via a data space. Our approach can be utilized to facilitate accessibility, interoperability and reuse by generating an RDF representation of a given dataset, in compliance with a specified input ontology that describes the application domain. The proposed framework also produces valuable metadata during data transformation, which is registered in a catalog to support the findability of the datasets. Furthermore, a change tracking algorithm is applied to detect modifications in the data between consecutive versions of datasets, thereby improving the overall user experience in identifying the most suitable dataset and version for each use case. We evaluate the applicability of our framework in a real-world use case scenario from the urban domain that involves multiple diverse datasets. The proposed framework enables the seamless onboarding of new participants in data spaces.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper