PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers. he paper describes Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PUMMA package includes not only the non-transposed matrix multiplication routine C = A ⋅ B, but also transposed multiplication routines C = AT ⋅ B, C = A ⋅ BT, and C = AT ⋅ BT, for a block cyclic data distribution. The routines perform efficiently for a wide range of processor configurations and block sizes. The PUMMA together provide the same functionality as the Level 3 BLAS routine xGEMM. Details of the parallel implementation of the routines are given, and results are presented for runs on the Intel Touchstone Delta computer.
Keywords for this software
References in zbMATH (referenced in 8 articles )
Showing results 1 to 8 of 8.
- Auckenthaler, T.; Bader, M.; Huckle, T.; Spörl, A.; Waldherr, K.: Matrix exponentials and parallel prefix computation in a quantum control problem (2010)
- Choi, Jaeyoung: PoLAPACK: Parallel factorization routines with algorithmic blocking (2001)
- D’Azevedo, Eduardo; Dongarra, Jack: The design and implementation of the parallel out-of-core scaLAPACK LU, QR, and Cholesky factorization routines (2001)
- Tinetti, Fernando; Quijano, Antonio; De Giusti, Armando; Luque, Emilio: Heterogeneous networks of workstations and the parallel matrix multiplication (2001)
- Choi, Jaeyoung: A new parallel matrix multiplication algorithm on distributed-memory concurrent computers (1998)
- Choi, Jaeyoung; Dongarra, Jack J.; Walker, David W.: Parallel matrix transpose algorithms on distributed memory concurrent computers (1995)
- Chou, C.-C.; Deng, Y.-F.; Li, G.; Wang, Y.: Parallelizing Strassen’s method for matrix multiplication on distributed-memory MIMD architectures (1995)
- Wolff von Gudenberg, Jürgen: Design of a parallel linear algebra library for verified computation (1995)