What question did this study set out to answer?

The aim is to create a unified optimization framework for deep learning that addresses both training and inference challenges while considering hardware limitations.

March 12, 2026Open Access

Structural optimization for efficient deep learning: A unified framework from inference to training

Puntos clave

The aim is to create a unified optimization framework for deep learning that addresses both training and inference challenges while considering hardware limitations.
Developed HALOC for hardware-aware neural architecture search optimizing inference.
Introduced COMCAT to identify head-level low-rank structure for model size reduction.
Extended low-rank principles to training optimization with COAP and EcoSpa.
Evaluated across various vision, language, and multimodal tasks.
Achieved 72% reduction in FLOPs and 1.5× speedup for inference.
Reduced model size by 50% while outperforming conventional pruning methods.
Minimized optimizer memory usage by up to 81% without loss of convergence.
Cut training memory by 50% and improved training speed by 21% for LLaMA-1B.

Resumen

Deep neural network scalability is bottlenecked by the disparity between exponential model growth and hardware constraints. Unlike existing methods that optimize inference or training in isolation, this dissertation proposes a Unified Framework for Structural Optimization, systematically exploiting intrinsic low-rank properties throughout the model lifecycle. For inference, we exploit static low-rank structures to address weight redundancy. HALOC formulates rank selection as hardware-aware neural architecture search, navigating the combinatorial rank space to achieve 72% FLOPs reduction and 1.5× speedup. COMCAT uncovers head-level low-rank structure, a granularity overlooked by conventional matrix decomposition, reducing model size by 50% while outperforming standard pruning methods. For training, we extend low-rank principles to dynamic optimization. COAP resolves trajectory discontinuities by preserving inter-projection correlations, reducing optimizer memory by up to 81% without convergence loss. Similarly, EcoSpa enforces coupled low-rank sparsity to preserve Transformer multiplicative interactions, reducing training memory by 50\% and accelerating wall-clock time by 21% for LLaMA-1B. Evaluated across representative vision, language, and multimodal tasks, this framework establishes low-rankness as a pervasive attribute of deep learning dynamics. By reconciling algorithmic designs with hardware constraints, it provides a systematic methodology for the hardware-conscious optimization of large-scale models.

Me gusta

Guardar

Ver artículo completo

Cite This Study

Jinqi Xiao (Thu,) studied this question.

synapsesocial.com/papers/69b258a396eeacc4fcec8742 https://doi.org/https://doi.org/10.7282/t3-v8de-qy35

Me gusta

Guardar

Ver artículo completo