Key points are not available for this paper at this time.
Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data; however, we lack an objective, coherent interpretation of the many metrics. To address this issue, we propose an evaluation framework with a single, mathematical objective that posits that the synthetic data should be drawn from the same distribution as the observed data. Through various structural decomposition of the objective, this framework allows us to reason for the first time the completeness of any set of metrics, as well as unifies existing metrics, including those that stem from fidelity considerations, downstream application, and model-based approaches. Moreover, the framework motivates model-free baselines and a new spectrum of metrics. We evaluate structurally informed synthesizers and synthesizers powered by deep learning. Using our structured framework, we show that synthetic data generators that explicitly represent tabular structure outperform other methods, especially on smaller datasets.
Building similarity graph...
Analyzing shared references across papers
Loading...
Scott Cheng‐Hsin Yang
Rutgers, The State University of New Jersey
Baxter Eaves
Network for Business Sustainability
Michael Schmidt
University of Kassel
Building similarity graph...
Analyzing shared references across papers
Loading...
Yang et al. (Fri,) studied this question.
synapsesocial.com/papers/68e73dcfb6db6435876b74d7 — DOI: https://doi.org/10.48550/arxiv.2403.10424
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: