Key points are not available for this paper at this time.
Dataset search is a long-standing problem across both industry and academia. While most industry tools focus on identifying one or more datasets matching a user-specified query, most recent academic papers focus on the subsequent problems of join and union discovery for a given dataset. In this paper, we take a step back and ask: is the task of identifying an initial dataset really a solved problem? Are join and union discovery indeed the most pressing problems to work on? To answer these questions, we survey 89 data professionals and surface the objectives, processes, and tools used to search for structured datasets, along with the challenges faced when using existing systems. We uncover characteristics of data content and metadata that are most important for data professionals during search, such as granularity and data freshness. Informed by our analysis, we argue that dataset search is not yet a solved problem, but is, in fact, difficult to solve. To move the needle in the right direction, we distill four desiderata for future dataset search systems: iterative interfaces, hybrid querying, task-driven search and result diversity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Madelon Hulsebos
Centrum Wiskunde & Informatica
W. Lin
Chengdu Institute of Information Technology (China)
Shreya Shankar
University of California, Berkeley
University of California, Berkeley
Berkeley College
Building similarity graph...
Analyzing shared references across papers
Loading...
Hulsebos et al. (Fri,) studied this question.
synapsesocial.com/papers/68e64b29b6db6435875db8a8 — DOI: https://doi.org/10.1145/3665939.3665959
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: