Key points are not available for this paper at this time.
We address the challenging problem of efficient inference across many devices resource constraints, especially on edge devices. Conventional approaches manually design or use neural architecture search (NAS) to find a neural network and train it from scratch for each case, which is prohibitive (causing CO₂ emission as much as 5 cars') thus unscalable. In this work, we propose to train a once-for-all (OFA) network that supports diverse architectural settings by decoupling and search, to reduce the cost. We can quickly get a specialized-network by selecting from the OFA network without additional training. To train OFA networks, we also propose a novel progressive shrinking, a generalized pruning method that reduces the model size across many dimensions than pruning (depth, width, kernel size, and resolution). It obtain a surprisingly large number of sub-networks (> 10^19) that can different hardware platforms and latency constraints while maintaining the level of accuracy as training independently. On diverse edge devices, OFA outperforms state-of-the-art (SOTA) NAS methods (up to 4. 0% top1 accuracy improvement over MobileNetV3, or same accuracy but 1. 5x than MobileNetV3, 2. 6x faster than EfficientNet w. r. t measured latency) reducing many orders of magnitude GPU hours and CO₂ emission. In, OFA achieves a new SOTA 80. 0% ImageNet top-1 accuracy under the setting (<600M MACs). OFA is the winning solution for the 3rd Low Computer Vision Challenge (LPCVC), DSP classification track and the 4th, both classification track and detection track. Code and 50 pre-trained (for many devices & many latency constraints) are released at: //github. com/mit-han-lab/once-for-all.
Cai et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: