The remarkable success of Deep Neural Networks (DNNs) has revolutionized artificial intelligence since the breakthrough of AlexNet in 2012. However, the deployment of these powerful models remains constrained by their substantial computational and memory requirements. Modern DNNs demand extensive computing resources and high-performance hardware infrastructure, resulting in significant energy consumption and limiting their applicability in resource-constrained environments such as edge devices and embedded systems. Network quantization, particularly Binary Neural Networks (BNNs), emerges as a promising direction to address these challenges by reducing numerical precision to single bits. While this approach offers theoretical advantages in model compression and computational efficiency, achieving competitive performance with binary networks presents fundamental challenges in architecture design, hardware efficiency, and optimization methodology. To address these challenges, this dissertation presents a systematic investigation of binary neural networks, beginning with the development of BNext. This novel architecture represents a fundamental rethinking of binary network design, incorporating careful analysis of information flow and loss landscape characteristics. By introducing specialized information enhancement modules, BNext effectively addresses the feature representation limitations inherent in binary networks while maintaining computational efficiency. Through systematic analysis of computation patterns and memory access mechanisms, we further develop BoolNet, an architecture specifically optimized for energy efficiency. This investigation reveals critical insights into the relationship between architectural decisions and hardware performance, culminating in optimized boolean operations and memory access patterns that significantly improve the performance-energy trade-off. The unique characteristics of binary networks necessitate specialized training methodologies. Building on our architectural innovations, we develop a dedicated optimizer that addresses the discrete optimization challenges in binary networks. Our approach combines theoretical analysis of gradient flow with practical considerations of training stability, resulting in robust update rules that demonstrate superior convergence properties in evaluation. To fully realize the potential of binary networks across diverse hardware platforms, we introduce an auto-tuning framework for low-bit General Matrix Multiplication (GEMM) operations. This framework systematically addresses the complexity of performance optimization through careful analysis of GEMM templates and parameter spaces, leading to significant improvements in inference efficiency when integrated with theCutlass library. Extensive experimental validation demonstrates that our comprehensive approach advances the state-of-the-art in binary neural networks across multiple dimensions: architectural design, hardware efficiency, optimization methodology, and practical deployment. This work contributes to the broader goal of enabling efficient artificial intelligence in resource-constrained environments, establishing new possibilities for energy-efficient deep learning applications.
Nianhui Guo (Thu,) studied this question.