Los puntos clave no están disponibles para este artículo en este momento.
The application of convolutional neural network (CNN) is mainly based on the training over the big data, and it is necessary to obtain the relatively accurate values in real time. Although the GPU has several advantages over CPU, most of the clusters in mainstream parallel environments are composed of servers based on CPUs. So it is more useful to speed up the algorithms over the CPU-based clusters by parallelization. In this paper, an improved CNN algorithm based on Winogard algorithm was proposed. Firstly, we analyzed Winogard algorithm and found it has the best advantage in speed to small convolution kernel, such as 3×3. Then we increase the convolution process speed by reducing the complexity of convolution calculation. We also discussed the parallel implementation and optimization to the improved Winogard algorithm on CPU-based cluster. Finally, we compared the computing efficiency of different numbers of computing cores in the Intel Xeon Phi CPU platform and the results show that the best parallelism is about 6 TFLOPS.
Huang et al. (Thu,) studied this question.