Revisiting Strassen’s Matrix Multiplication for Multicore Systems

Domingo Giménez Cánovas


Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × n from the O(n3) cost of a typical three-loop implementation to approximately O(n2.81). The reduction is made at the expense of additional operations of cost O(n2), and additional memory is needed for temporary results of recursive calls. The advantage of Strassen’s algorithm is therefore only apparent for larger matrices and it requires careful implementation. The increase in the speed of computational systems with several cores which share a common memory space also makes it more difficult for the algorithm to compete against highly optimized three-loop multiplications. This paper discusses various aspects which need to be addressed when designing Strassen multiplications for today’s multicore systems.


Strassen matrix multiplication; multicore; linear algebra libraries; batched linear algebra routines


A. Abdelfattah, A. Haidar, S. Tomov, and J. J. Dongarra. Performance, design, and autotuning of batched GEMM for GPUs. In

Proceedings of 31st International Conference on High Performance Computing, pages 21–38, 2016.

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. Numerical linear

algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series, 180(1), 2009.

P. Alberti, P. Alonso, A. M. Vidal, J. Cuenca, and D. Giménez. Designing polylibraries to speed up linear algebra computations.

International Journal of High Performance Computing and Networking, 1(1/2/3):75–84, 2004.

E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. J. Dongarra, J. D. Croz, A. Grenbaum, S. Hammarling, A. McKenney,

S. Ostrouchov, and D. Sorensen. LAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA,

G. Bernabé, J. Cuenca, L. García, and D. Giménez. Auto-tuning techniques for linear algebra routines on hybrid platforms. J.

Comput. Science, 10:299–310, 2015.

G. Brassard and P. Bratley. Fundamentals of Algorithms. Prentice-Hall, 1996.

J. Cámara, J. Cuenca, L. García, and D. Giménez. Auto-tuned nested parallelism: A way to reduce the execution time of scientific

software in NUMA systems. Parallel Computing, 40(7):309–327, 2014.

J. Cámara, J. Cuenca, D. Giménez, L. García, and A. M. Vidal. Empirical installation of linear algebra shared-memory subroutines

for auto-tuning. International Journal of Parallel Programming, 42(3):408–434, 2014.

T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, 1990.

J. J. Dongarra, S. Hammarling, N. J. Higham, S. D. Relton, and M. Zounon. Optimized batched linear algebra for modern

architectures. In Proceedings of Euro-Par 2017, pages 511–522, 2017.

G. Golub and C. F. V. Loan. Matrix Computations. The John Hopkins University Press, fourth edition, 2013.

J. Huang, T. M. Smith, G. M. Henry, and R. A. van de Geijn. Strassen’s algorithm reloaded. In Proceedings of the International

Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November

-18, 2016, pages 690–701, 2016.

S. Hunold and T. Rauber. Automatic tuning of PDGEMM towards optimal performance. In 11th International Euro-Par

Conference, Lecture Notes in Computer Science, volume 3648, pages 837–846, 2005.

Intel MKL web page.

J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia. Scheduling dense linear algebra operations on multicore processors.

Concurrency and Computation: Practice and Experience, 22(1):15–44, 2010.

T. Sakurai, T. Katagiri, H. Kuroda, K. Naono, M. Igai, and S. Ohshima. A sparse matrix library with automatic selection of

iterative solvers and preconditioners. In Proceedings of the International Conference on Computational Science (ICCS), LNCS,

pages 1332–1341, Barcelona, Spain, June 2013.

V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 3(14):354–356, 1969.

Full Text: PDF


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.