Revisiting Strassen’s Matrix Multiplication for Multicore Systems
Main Article Content
Abstract
Keywords:
Downloads
Article Details
References
A. Abdelfattah, A. Haidar, S. Tomov, and J. J. Dongarra. Performance, design, and autotuning of batched GEMM for GPUs. In
Proceedings of 31st International Conference on High Performance Computing, pages 21–38, 2016.
E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. Numerical linear
algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series, 180(1), 2009.
P. Alberti, P. Alonso, A. M. Vidal, J. Cuenca, and D. Giménez. Designing polylibraries to speed up linear algebra computations.
International Journal of High Performance Computing and Networking, 1(1/2/3):75–84, 2004.
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. J. Dongarra, J. D. Croz, A. Grenbaum, S. Hammarling, A. McKenney,
S. Ostrouchov, and D. Sorensen. LAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA,
G. Bernabé, J. Cuenca, L. García, and D. Giménez. Auto-tuning techniques for linear algebra routines on hybrid platforms. J.
Comput. Science, 10:299–310, 2015.
G. Brassard and P. Bratley. Fundamentals of Algorithms. Prentice-Hall, 1996.
J. Cámara, J. Cuenca, L. García, and D. Giménez. Auto-tuned nested parallelism: A way to reduce the execution time of scientific
software in NUMA systems. Parallel Computing, 40(7):309–327, 2014.
J. Cámara, J. Cuenca, D. Giménez, L. García, and A. M. Vidal. Empirical installation of linear algebra shared-memory subroutines
for auto-tuning. International Journal of Parallel Programming, 42(3):408–434, 2014.
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, 1990.
J. J. Dongarra, S. Hammarling, N. J. Higham, S. D. Relton, and M. Zounon. Optimized batched linear algebra for modern
architectures. In Proceedings of Euro-Par 2017, pages 511–522, 2017.
G. Golub and C. F. V. Loan. Matrix Computations. The John Hopkins University Press, fourth edition, 2013.
J. Huang, T. M. Smith, G. M. Henry, and R. A. van de Geijn. Strassen’s algorithm reloaded. In Proceedings of the International
Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November
-18, 2016, pages 690–701, 2016.
S. Hunold and T. Rauber. Automatic tuning of PDGEMM towards optimal performance. In 11th International Euro-Par
Conference, Lecture Notes in Computer Science, volume 3648, pages 837–846, 2005.
Intel MKL web page. http://software.intel.com/en-us/intel-mkl/.
J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia. Scheduling dense linear algebra operations on multicore processors.
Concurrency and Computation: Practice and Experience, 22(1):15–44, 2010.
T. Sakurai, T. Katagiri, H. Kuroda, K. Naono, M. Igai, and S. Ohshima. A sparse matrix library with automatic selection of
iterative solvers and preconditioners. In Proceedings of the International Conference on Computational Science (ICCS), LNCS,
pages 1332–1341, Barcelona, Spain, June 2013.
V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 3(14):354–356, 1969.