Small sample sizes in preclinical research limit the extraction of reliable knowledge and hinder translational progress. We propose genESOM, a generative artificial intelligence method based on emergent self‑organizing maps. genESOM is designed to augment small biomedical datasets while controlling α‑error inflation. It separates structure learning from data synthesis and integrates error propagation mitigation through dimensionality modulation, enabling safe and interpretable data augmentation. Using lipid signaling data from a preclinical multiple sclerosis study employing the experimental autoimmune encephalomyelitis (EAE) model (26 female SJL/J mice, three treatment groups, and 62 lipid mediators), we intentionally reduced the sample size from 26 to 18 animals. This reduction abolished detectable group differences by both statistical and machine learning analyses. Augmenting the reduced dataset with AI‑generated cases restored treatment‑specific segregation and recovered the original key lipid mediators. genESOM achieved consistent fidelity without introducing false positives. In contrast, Gaussian mixture and conditional GAN models failed under comparable constraints. These results demonstrate that genESOM provides a robust, error‑controlled framework for enhancing knowledge extraction from limited preclinical samples. While synthetic augmentation cannot substitute for biological replication, it can support exploratory analyses and help reduce the need for additional animal experimentation.
Lötsch et al. (Sun,) studied this question.