SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. This paper describes SuperMatrix, a runtime system that parallelizes matrix operations for SMP and/or multi-core architectures. We use this system to demonstrate how code described at a high level of abstraction can achieve high performance on such architectures while completely hiding the parallelism from the library programmer. The key insight entails viewing matrices hierarchically, consisting of blocks that serve as units of data where operations over those blocks are treated as units of computation. The implementation transparently enqueues the required operations, internally tracking dependencies, and then executes the operations utilizing out-of-order execution techniques inspired by superscalar microarchitectures. This separation of concerns allows library developers to implement algorithms without concerning themselves with the parallelization aspect of the problem. Different heuristics for scheduling operations can be implemented in the runtime system independent of the code that enqueues the operations. Results gathered on a 16 CPU ccNUMA Itanium2 server demonstrate excellent performance.

References in zbMATH (referenced in 6 articles )

Showing results 1 to 6 of 6.
Sorted by year (citations)

  1. Van Zee, Field G.; van de Geijn, Robert A.: BLIS: a framework for rapidly instantiating BLAS functionality (2015)
  2. Bosilca, George; Bouteiller, Aurelien; Danalis, Anthony; Herault, Thomas; Lemarinier, Pierre; Dongarra, Jack: DAGuE: A generic distributed DAG engine for high performance computing (2012) ioport
  3. Igual, Francisco D.; Chan, Ernie; Quintana-Ortí, Enrique S.; Quintana-Ortí, Gregorio; Van De Geijn, Robert A.; Van Zee, Field G.: The FLAME approach: from dense linear algebra algorithms to high-performance multi-accelerator implementations (2012) ioport
  4. Agullo, Emmanuel; Bouwmeester, Henricus; Dongarra, Jack; Kurzak, Jakub; Langou, Julien; Rosenberg, Lee: Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures (2011)
  5. Milani, Cleber Roberto; Kolberg, Mariana; Fernandes, Luiz Gustavo: Solving dense interval linear systems with verified computing on multicore architectures (2011)
  6. Quintana-Ortí, Gregorio; Quintana-Ortí, Enrique S.; Van De Geijn, Robert A.; Van Zee, Field G.; Chan, Ernie: Programming matrix algorithms-by-blocks for thread-level parallelism (2009)