What does this research mean for the field?

Caddie improves ad hoc dataset retrieval by utilizing content-based methods to handle the complexities of RDF data. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This research aims to enhance ad hoc dataset retrieval (AHDR) by focusing on the content within RDF datasets.

February 26, 2026Open Access

View Full Paper

Caddie: A prototype of content-based ad hoc RDF dataset retrieval

XWXiaxia WangUniversity of Oxford QCQiaosheng ChenNanjing University QSQing ShiNanjing University

Key Points

This research aims to enhance ad hoc dataset retrieval (AHDR) by focusing on the content within RDF datasets.
Developed a prototype named Caddie for content-based AHDR.
Investigated three main tasks: dataset retrieval, deduplication, and snippet extraction.
Evaluated effectiveness on a public test collection and through a user study.
Demonstrated improvements in retrieval efficiency.
Showed enhanced deduplication capabilities.
Provided practical insights from user evaluations.

Abstract

The rapid growth of open and structured RDF data on the Web has promoted the development of dataset search as an important research topic. The core function of existing systems is ad hoc dataset retrieval (AHDR) based on the metadata of datasets, which contains limited information and often suffers from quality issues. To overcome the limitations, in this article, we systematically investigate content-based AHDR to exploit the actual RDF data in datasets. We address three main tasks of content-based AHDR with novel methods for handling the large size and complex structure of RDF data to facilitate dataset retrieval, deduplication, and snippet extraction. These methods are integrated into an online and open-source prototype called Caddie . The effectiveness and practicability of its components are evaluated on a public test collection and by a user study.

Demander à l'IA

Bookmark

View Full Paper

Demander à l'IA

Bookmark

View Full Paper

Caddie: A prototype of content-based ad hoc RDF dataset retrieval

Key Points

Abstract

Cite This Study