What question did this study set out to answer?

The aim is to establish the significance of data quality in modern AI development.

May 6, 2026Open Access

Data-Centric AI Manifesto: How Data Quality Drives Modern AI

Key Points

The aim is to establish the significance of data quality in modern AI development.
Adoption of the Data-Centric Artificial Intelligence (DCAI) framework.
Conceptual analysis distinguishing DCAI from model-centric approaches.
Presentation of a comprehensive data-centric lifecycle for AI.
Exploration of techniques like semantic data representation and continuous monitoring.
Improvement in performance and robustness of AI systems through enhanced data practices.
Increased interpretability and regulatory compliance by focusing on data quality.
Support for responsible deployment of generative models through systematic data management.

Abstract

Artificial Intelligence (AI) has traditionally been developed according to a model-centric paradigm, in which progress is driven by increasingly sophisticated learning architectures applied to largely fixed datasets. However, this paradigm exhibits well-known limitations, including sensitivity to label noise, distribution shifts, adversarial perturbations, and limited transparency and reproducibility. These issues indicate that many of the current bottlenecks of AI systems arise from deficiencies in data rather than from model design. In this paper, we adopt and formalize the Data-Centric Artificial Intelligence (DCAI) paradigm, which places data quality, semantic consistency, and representativeness at the core of the AI lifecycle. From this perspective, performance, robustness, interpretability, and regulatory compliance are primarily achieved through systematic data engineering, including data curation, enrichment, validation, and continuous monitoring, rather than through repeated model re-engineering. The contributions of this work are threefold. First, a conceptual framework is provided to clarify the epistemic and methodological foundations of DCAI and distinguish it from traditional model-centric approaches. Second, a data-centric lifecycle is presented, covering training data development, inference data design, and data maintenance and integrating techniques such as semantic data representation, active learning, synthetic data generation, and drift-aware quality control. Third, the role of DCAI in the context of Generative AI is analyzed, showing how data-centric practices are essential to ensure robustness, accountability, and responsible deployment of large-scale generative models. Overall, this work positions DCAI as a coherent methodological and technological framework for the development of trustworthy, resilient, and sustainable AI systems, making a research contribution and providing a reference model for industrial and regulatory contexts.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Donato Malerba

Antonella Poggi

Mario Alviano

Journals

Electronics

Actions

Institutions

Sapienza University of Rome

University of Naples Federico II

University of Bari Aldo Moro

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Data-Centric AI Manifesto: How Data Quality Drives Modern AI

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study