Elemental: A New Framework for Distributed Memory Dense Matrix Computations. Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since the traditional MPI-based approaches will likely need to be extended. Thus, this is a good time to review lessons learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves competitive, if not superior, performance on large clusters.

This software is also peer reviewed by journal TOMS.

References in zbMATH (referenced in 25 articles )

Showing results 1 to 20 of 25.
Sorted by year (citations)

1 2 next

  1. Myllykoski, Mirko: Algorithm 1019: a task-based multi-shift QR/QZ algorithm with aggressive early deflation (2022)
  2. Jolivet, Pierre; Roman, Jose E.; Zampini, Stefano: KSPHPDDM and PCHPDDM: extending PETSc with advanced Krylov methods and robust multilevel overlapping Schwarz preconditioners (2021)
  3. Popovici, Doru Thom; Schatz, Martin D.; Franchetti, Franz; Low, Tze Meng: A flexible framework for multidimensional DFTs (2020)
  4. Tanaka, Kazuyuki; Imachi, Hiroto; Fukumoto, Tomoya; Kuwata, Akiyoshi; Harada, Yuki; Fukaya, Takeshi; Yamamoto, Yusaku; Hoshi, Takeo: EigenKernel (2019)
  5. Harlim, John; Yang, Haizhao: Diffusion forecasting model with basis functions from QR-decomposition (2018)
  6. Avron, Haim; Clarkson, Kenneth L.; Woodruff, David P.: Faster kernel ridge regression using sketching and preconditioning (2017)
  7. Boutsidis, Christos; Drineas, Petros; Kambadur, Prabhanjan; Kontopoulou, Eugenia-Maria; Zouzias, Anastasios: A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix (2017)
  8. Di Napoli, Edoardo; Peise, Elmar; Hrywniak, Markus; Bientinesi, Paolo: High-performance generation of the Hamiltonian and overlap matrices in FLAPW methods (2017)
  9. Li, Yingzhou; Ying, Lexing: Distributed-memory hierarchical interpolative factorization (2017)
  10. Lu, Jianfeng; Yang, Haizhao: Preconditioning orbital minimization method for planewave discretization (2017)
  11. Martinsson, Per-Gunnar; Quintana OrtĂ­, Gregorio; Heavner, Nathan; van de Geijn, Robert: Householder QR factorization with randomization for column pivoting (HQRRP) (2017)
  12. Van Zee, Field G.; Smith, Tyler M.: Implementing high-performance complex matrix multiplication via the 3m and 4m methods (2017)
  13. Beliakov, Gleb; Matiyasevich, Yuri: A parallel algorithm for calculation of determinants and minors using arbitrary precision arithmetic (2016)
  14. Nourgaliev, R.; Luo, H.; Weston, B.; Anderson, A.; Schofield, S.; Dunn, T.; Delplanque, J.-P.: Fully-implicit orthogonal reconstructed discontinuous Galerkin method for fluid dynamics with phase change (2016)
  15. Schatz, Martin D.; van de Geijn, Robert A.; Poulson, Jack: Parallel matrix multiplication: a systematic journey (2016)
  16. Van Loan, Charles F.: Structured matrix problems from tensors (2016)
  17. Banerjee, Amartya S.; Elliott, Ryan S.; James, Richard D.: A spectral scheme for Kohn-Sham density functional theory of clusters (2015)
  18. Van Zee, Field G.; van de Geijn, Robert A.: BLIS: a framework for rapidly instantiating BLAS functionality (2015)
  19. Vecharynski, Eugene; Yang, Chao; Pask, John E.: A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a Hermitian matrix (2015)
  20. Fabregat-Traver, Diego; Aulchenko, Yurii S.; Bientinesi, Paolo: Solving sequences of generalized least-squares problems on multi-threaded architectures (2014)

1 2 next