Key points are not available for this paper at this time.
Neural architecture search (NAS) has a great impact by automatically effective neural network architectures. However, the prohibitive demand of conventional NAS algorithms (e. g. 10⁴ GPU hours) it difficult to search the architectures on large-scale (e. g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via continuous representation of network architecture but suffers from the high memory consumption issue (grow linearly w. r. t. candidate set size). As a, they need to utilize~ tasks, such as training on a smaller, or learning with only a few blocks, or training just for a few epochs. architectures optimized on proxy tasks are not guaranteed to be optimal the target task. In this paper, we present that can learn the architectures for large-scale target tasks and target platforms. We address the high memory consumption issue of NAS and reduce the computational cost (GPU hours and GPU memory) the same level of regular training while still allowing a large candidate. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of and specialization. On CIFAR-10, our model achieves 2. 08\\% test with only 5. 7M parameters, better than the previous state-of-the-art AmoebaNet-B, while using 6\ fewer parameters. On ImageNet, model achieves 3. 1\\% better top-1 accuracy than MobileNetV2, while being1. 2\ faster with measured GPU latency. We also apply ProxylessNAS to neural architectures for hardware with direct hardware metrics (e. g. ) and provide insights for efficient CNN architecture design.
Cai et al. (Sun,) studied this question.