• A Data Spaces framework for data preparation based on standardized W3C technologies. • Metadata extraction for provenance and FAIRification of data, expressed in RDF triples. • Supports the discovery of datasets using semantically enriched queries on the metadata. Despite recent advancements in standardized data exchange and governance, facilitated by the concept of data spaces, data providers still lack essential tools to enable participation in data spaces. This limitation primarily stems from the fact that data is typically collected or generated in arbitrary formats, employing diverse types, flexible, and evolving schemas, as well as different data modalities (text, image, video, time series, etc.). Consequently, semantic interoperability and compliance with the FAIR principles are often overlooked, which compromises the utility of shareable data. In this work, we propose a framework that transforms raw or semi-structured data into trustworthy data made available via a data space. Our approach can be utilized to facilitate accessibility, interoperability and reuse by generating an RDF representation of a given dataset, in compliance with a specified input ontology that describes the application domain. The proposed framework also produces valuable metadata during data transformation, which is registered in a catalog to support the findability of the datasets. Furthermore, a change tracking algorithm is applied to detect modifications in the data between consecutive versions of datasets, thereby improving the overall user experience in identifying the most suitable dataset and version for each use case. We evaluate the applicability of our framework in a real-world use case scenario from the urban domain that involves multiple diverse datasets. The proposed framework enables the seamless onboarding of new participants in data spaces.
Santipantakis et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: