Abstract The abundance of large datasets has driven machine learning (ML) model performance and scalability breakthroughs. However, many domains and practical applications must contend with the limitations imposed by small and very small datasets. This survey thoroughly examines state-of-the-art methodologies and challenges in ML approaches tailored for scenarios where data scarcity is a fundamental constraint. We begin by outlining the theoretical foundations that govern learning from small data. Then, we discuss recent advancements in data-related frameworks (i.e., training and evaluation methods, etc.) and algorithmic architectures (meta and transfer learning). We also explore the trade-offs and related issues inherent in designing models for small data, such as overfitting, generalization error, and the bias-variance dilemma, as well as identify minimal interventions that can overcome such issues. Further, this survey covers the role of synthetic data generation and simulation-based approaches to enlarge data availability while critically assessing the implications of these techniques on model performance. Finally, in synthesizing open literature, we shed light on emerging trends/research directions that aim to overcome challenges arising from limited data, such as incorporating domain knowledge and causal principles to guide the learning process and integrating symbolic reasoning with statistical learning.
Building similarity graph...
Analyzing shared references across papers
Loading...
M. Z. Naser
Clemson University
Journal Of Big Data
Clemson University
Building similarity graph...
Analyzing shared references across papers
Loading...
M. Z. Naser (Fri,) studied this question.
synapsesocial.com/papers/696c776ceb60fb80d1395a50 — DOI: https://doi.org/10.1186/s40537-025-01346-9