Key points are not available for this paper at this time.
Convolutional Neural Networks (CNN) are among the most powerful and widely used algorithms for computer vision applications, notwithstanding their computation-demanding and memory-intensive operations. The cumbersome CNN operation stems from the bulky cross channel computation and massive parameter retrieving of convolutional (CONV) layers and fully-connected (FC) layers, respectively. In this paper, to remove the inter-filter redundancy, we constructed and tuned the specific low-rank filters in fully-connected layers. The proposed rank reduction saves 88.9% of both arithmetic and parameters of fully-connected layers in the VGG16 model. In addition, by employing network-layer-wise ping-pong DDR access mode, tile-grain on-chip feature map buffers, and Propagate Partial Multiply-Accumulate (PPMAC) processor, we implemented a 202.4 GFLOPS CNN accelerator with half-precision data format on Xilinx VC709 evaluation board. Experiments show that the accelerator achieved 6.58 fps throughput with 0.7046 top-1 accuracy and 0.8977 top-5 accuracy under 200MHz working frequency.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mei et al. (Wed,) studied this question.
synapsesocial.com/papers/6a14d589c03bfb96ef29ea2f — DOI: https://doi.org/10.1109/globalsip.2017.8309067
Chunsheng Mei
Chinese Academy of Sciences
Zhenyu Liu
University of Surrey
Yue Niu
Tianjin University of Science and Technology
Tsinghua University
Northwestern Polytechnical University
Building similarity graph...
Analyzing shared references across papers
Loading...