## Revisiting Strassen’s Matrix Multiplication for Multicore Systems

### Main Article Content

### Abstract

### Keywords:

### Downloads

### Article Details

### References

A. Abdelfattah, A. Haidar, S. Tomov, and J. J. Dongarra. Performance, design, and autotuning of batched GEMM for GPUs. In

Proceedings of 31st International Conference on High Performance Computing, pages 21–38, 2016.

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. Numerical linear

algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series, 180(1), 2009.

P. Alberti, P. Alonso, A. M. Vidal, J. Cuenca, and D. Giménez. Designing polylibraries to speed up linear algebra computations.

International Journal of High Performance Computing and Networking, 1(1/2/3):75–84, 2004.

E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. J. Dongarra, J. D. Croz, A. Grenbaum, S. Hammarling, A. McKenney,

S. Ostrouchov, and D. Sorensen. LAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA,

G. Bernabé, J. Cuenca, L. García, and D. Giménez. Auto-tuning techniques for linear algebra routines on hybrid platforms. J.

Comput. Science, 10:299–310, 2015.

G. Brassard and P. Bratley. Fundamentals of Algorithms. Prentice-Hall, 1996.

J. Cámara, J. Cuenca, L. García, and D. Giménez. Auto-tuned nested parallelism: A way to reduce the execution time of scientific

software in NUMA systems. Parallel Computing, 40(7):309–327, 2014.

J. Cámara, J. Cuenca, D. Giménez, L. García, and A. M. Vidal. Empirical installation of linear algebra shared-memory subroutines

for auto-tuning. International Journal of Parallel Programming, 42(3):408–434, 2014.

T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, 1990.

J. J. Dongarra, S. Hammarling, N. J. Higham, S. D. Relton, and M. Zounon. Optimized batched linear algebra for modern

architectures. In Proceedings of Euro-Par 2017, pages 511–522, 2017.

G. Golub and C. F. V. Loan. Matrix Computations. The John Hopkins University Press, fourth edition, 2013.

J. Huang, T. M. Smith, G. M. Henry, and R. A. van de Geijn. Strassen’s algorithm reloaded. In Proceedings of the International

Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November

-18, 2016, pages 690–701, 2016.

S. Hunold and T. Rauber. Automatic tuning of PDGEMM towards optimal performance. In 11th International Euro-Par

Conference, Lecture Notes in Computer Science, volume 3648, pages 837–846, 2005.

Intel MKL web page. http://software.intel.com/en-us/intel-mkl/.

J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia. Scheduling dense linear algebra operations on multicore processors.

Concurrency and Computation: Practice and Experience, 22(1):15–44, 2010.

T. Sakurai, T. Katagiri, H. Kuroda, K. Naono, M. Igai, and S. Ohshima. A sparse matrix library with automatic selection of

iterative solvers and preconditioners. In Proceedings of the International Conference on Computational Science (ICCS), LNCS,

pages 1332–1341, Barcelona, Spain, June 2013.

V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 3(14):354–356, 1969.