What question did this study set out to answer?

The aim is to improve data loading and augmentation in deep learning workflows, especially for non-standard data formats.

May 8, 2026Open Access

A parallel framework for data input pipelines and online data augmentation in deep learning

Key Points

The aim is to improve data loading and augmentation in deep learning workflows, especially for non-standard data formats.
Developed a parallel framework utilizing global shared memory and a ring buffer architecture.
Enabled multiple CPU workers to load and preprocess data in parallel while bypassing memory bottlenecks.
Validated on sign language recognition and hyperspectral image classification tasks.
Achieved up to 27× acceleration in isolated data ingestion and 28× in end-to-end training.
Delivered up to 8× faster data loading and 7× faster full training compared to existing TensorFlow and PyTorch pipelines.

Abstract

Abstract Efficient data ingestion and online data augmentation remain challenges in deep learning workflows, particularly when dealing with datasets containing non-standard formats or massive multidimensional arrays that natively optimised functions cannot fully manage. This work presents a parallel framework that integrates and global shared memory through a ring buffer architecture, enabling high-throughput data loading and flexible on-the-fly augmentation. The framework decouples data production from consumption, allowing multiple CPU workers to load and preprocess batches in parallel while completely bypassing the Python GIL and memory bottlenecks. Crucially, the framework supports both CPU-side and GPU-side augmentation strategies, adapting to whether complex conditional transformations or framework-native operations are required. The proposed approach was validated on two representative tasks: (i) sign language recognition from human pose CSV sequences, and (ii) hyperspectral image classification using massive arrays. Relative to standard sequential baselines, the proposed framework achieved up to 27 27 × acceleration in isolated data ingestion and up to 28 28 × in end-to-end training. Importantly, even against natively optimised parallel TensorFlow and PyTorch pipelines, it still delivered up to 8 8 × faster data loading and up to 7 7 × faster full training in memory-intensive scenarios. Overall, the proposed framework provides a scalable, multi-GPU compatible solution for deep learning pipelines, showing robust performance across both I/O-bound and memory-constrained scenarios in TensorFlow and PyTorch while alleviating memory fragmentation and allocation constraints.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Toro-Castro et al. (Sun,) studied this question.

synapsesocial.com/papers/69fd7ee0bfa21ec5bbf073af — DOI: https://doi.org/10.1007/s11227-026-08543-0

Authors

Antonio De Toro-Castro

University of Almería

Marcos Lupión

University of Ulster

Vicente González-Ruíz

University of Almería

Journals

The Journal of Supercomputing

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A parallel framework for data input pipelines and online data augmentation in deep learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion