Sparsity: Optimization Framework for Sparse Matrix Kernels. Sparse matrix–vector multiplication is an important computational kernel that performs poorly on most modern processors due to a low compute-to-memory ratio and irregular memory access patterns. Optimization is difficult because of the complexity of cache-based memory systems and because performance is highly dependent on the non-zero structure of the matrix. The SPARSITY system is designed to address these problems by allowing users to automatically build sparse matrix kernels that are tuned to their matrices and machines. SPARSITY combines traditional techniques such as loop transformations with data structure transformations and optimization heuristics that are specific to sparse matrices. It provides a novel framework for selecting optimization parameters, such as block size, using a combination of performance models and search. In this paper we discuss the optimization of two operations: a sparse matrix times a dense vector and a sparse matrix times a set of dense vectors. Our experience indicates that register level optimizations are effective for matrices arising in certain scientific simulations, in particular finite-element problems. Cache level optimizations are important when the vector used in multiplication is larger than the cache size, especially for matrices in which the non-zero structure is random. For applications involving multiple vectors, reorganizing the computation to perform the entire set of multiplications as a single operation produces significant speedups. We describe the different optimizations and parameter selection techniques and evaluate them on several machines using over 40 matrices taken from a broad set of application domains. Our results demonstrate speedups of up to 4× for the single vector case and up to 10× for the multiple vector case.

References in zbMATH (referenced in 11 articles )

Showing results 1 to 11 of 11.
Sorted by year (citations)

  1. Phipps, E.; D’Elia, M.; Edwards, H.C.; Hoemmen, M.; Hu, J.; Rajamanickam, S.: Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures (2017)
  2. Röhrig-Zöllner, Melven; Thies, Jonas; Kreutzer, Moritz; Alvermann, Andreas; Pieper, Andreas; Basermann, Achim; Hager, Georg; Wellein, Gerhard; Fehske, Holger: Increasing the performance of the Jacobi-Davidson method by blocking (2015)
  3. Zhu, Sheng-Xin; Gu, Tong-Xiang; Liu, Xing-Ping: Minimizing synchronizations in sparse iterative solvers for distributed supercomputers (2014)
  4. Abed, Khalid H.; Morris, Gerald R.: Improving performance of codes with large/irregular stride memory access patterns via high performance reconfigurable computers (2013) ioport
  5. Lu, Qingda; Gao, Xiaoyang; Krishnamoorthy, Sriram; Baumgartner, Gerald; Ramanujam, J.; Sadayappan, P.: Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions (2012) ioport
  6. Georgii, Joachim; Westermann, Rüdiger: A streaming approach for sparse matrix products and its application in Galerkin multigrid methods (2010)
  7. Krotkiewski, M.; Dabrowski, M.: Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs (2010)
  8. Williams, Samuel; Shalf, John; Oliker, Leonid; Kamil, Shoaib; Husbands, Parry; Yelick, Katherine: Scientific computing kernels on the cell processor (2007) ioport
  9. Baglama, James; Reichel, Lothar: Restarted block Lanczos bidiagonalization methods (2006)
  10. Kirby, Robert C.; Knepley, Matthew; Logg, Anders; Scott, L.Ridgway: Optimizing the evaluation of finite element matrices (2005)
  11. Pinar, Ali; Vassilevska, Virginia: Finding nonoverlapping substructures of a sparse matrix (2005)