July 22, 2008

High-performance implementation of the level-3 BLAS

Key Points

Key points are not available for this paper at this time.

Abstract

A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented. Exceptional performance is demonstrated on various architectures.

Bookmark

Cite This Study

Goto et al. (Tue,) studied this question.

synapsesocial.com/papers/6a1044fb4fb650da4fff3328 https://doi.org/https://doi.org/10.1145/1377603.1377607