Deep neural network scalability is bottlenecked by the disparity between exponential model growth and hardware constraints. Unlike existing methods that optimize inference or training in isolation, this dissertation proposes a Unified Framework for Structural Optimization, systematically exploiting intrinsic low-rank properties throughout the model lifecycle. For inference, we exploit static low-rank structures to address weight redundancy. HALOC formulates rank selection as hardware-aware neural architecture search, navigating the combinatorial rank space to achieve 72% FLOPs reduction and 1.5× speedup. COMCAT uncovers head-level low-rank structure, a granularity overlooked by conventional matrix decomposition, reducing model size by 50% while outperforming standard pruning methods. For training, we extend low-rank principles to dynamic optimization. COAP resolves trajectory discontinuities by preserving inter-projection correlations, reducing optimizer memory by up to 81% without convergence loss. Similarly, EcoSpa enforces coupled low-rank sparsity to preserve Transformer multiplicative interactions, reducing training memory by 50\% and accelerating wall-clock time by 21% for LLaMA-1B. Evaluated across representative vision, language, and multimodal tasks, this framework establishes low-rankness as a pervasive attribute of deep learning dynamics. By reconciling algorithmic designs with hardware constraints, it provides a systematic methodology for the hardware-conscious optimization of large-scale models.
Jinqi Xiao (Thu,) studied this question.