What question did this study set out to answer?

This review aims to explore machine learning methodologies designed for small and limited datasets, addressing related challenges.

January 18, 2026Open Access

A review of machine learning with small and limited data

Read Full Paperexternally

Key Points

This review aims to explore machine learning methodologies designed for small and limited datasets, addressing related challenges.
Examined theoretical foundations of learning from small data
Discussed advancements in training and evaluation methods
Explored algorithmic architectures like meta and transfer learning
Identified issues such as overfitting and generalization error
Assessed synthetic data generation techniques and their implications
Highlighted the importance of addressing overfitting in small datasets
Showcased advancements in transfer learning and simulation-based methods
Identified trends integrating domain knowledge to enhance model performance
Determined minimal interventions to improve learning outcomes with scarce data

Abstract

Abstract The abundance of large datasets has driven machine learning (ML) model performance and scalability breakthroughs. However, many domains and practical applications must contend with the limitations imposed by small and very small datasets. This survey thoroughly examines state-of-the-art methodologies and challenges in ML approaches tailored for scenarios where data scarcity is a fundamental constraint. We begin by outlining the theoretical foundations that govern learning from small data. Then, we discuss recent advancements in data-related frameworks (i.e., training and evaluation methods, etc.) and algorithmic architectures (meta and transfer learning). We also explore the trade-offs and related issues inherent in designing models for small data, such as overfitting, generalization error, and the bias-variance dilemma, as well as identify minimal interventions that can overcome such issues. Further, this survey covers the role of synthetic data generation and simulation-based approaches to enlarge data availability while critically assessing the implications of these techniques on model performance. Finally, in synthesizing open literature, we shed light on emerging trends/research directions that aim to overcome challenges arising from limited data, such as incorporating domain knowledge and causal principles to guide the learning process and integrating symbolic reasoning with statistical learning.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

M. Z. Naser

Clemson University

Journals

Journal Of Big Data

Actions

Institutions

Clemson University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A review of machine learning with small and limited data

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study