clSpMV: a cross-platform OpenCL SpMV framework on GPUs. Sparse matrix vector multiplication (SpMV) kernel is a key computation in linear algebra. Most iterative methods are composed of SpMV operations with BLAS1 updates. Therefore, researchers make extensive efforts to optimize the SpMV kernel in sparse linear algebra. With the appearance of OpenCL, a programming language that standardizes parallel programming across a wide variety of heterogeneous platforms, we are able to optimize the SpMV kernel on many different platforms. In this paper, we propose a new sparse matrix format, the Cocktail Format, to take advantage of the strengths of many different sparse matrix formats. Based on the Cocktail Format, we develop the clSpMV framework that is able to analyze all kinds of sparse matrices at runtime, and recommend the best representations of the given sparse matrices on different platforms. Although solutions that are portable across diverse platforms generally provide lower performance when compared to solutions that are specialized to particular platforms, our experimental results show that clSpMV can find the best representations of the input sparse matrices on both Nvidia and AMD platforms, and deliver 83% higher performance compared to the vendor optimized CUDA implementation of the proposed hybrid sparse format in [3], and 63.6% higher performance compared to the CUDA implementations of all sparse formats in [3].

This software is also peer reviewed by journal TOMS.

References in zbMATH (referenced in 10 articles )

Showing results 1 to 10 of 10.
Sorted by year (citations)

  1. Chen, Yuedan; Xiao, Guoqing; Wu, Fan; Tang, Zhuo; Li, Keqin: tpSpMV: a two-phase large-scale sparse matrix-vector multiplication kernel for manycore architectures (2020)
  2. Tan, Guangming; Liu, Junhong; Li, Jiajia: Design and implementation of adaptive SpMV library for multicore and many-core architecture (2018)
  3. Filippone, Salvatore; Cardellini, Valeria; Barbieri, Davide; Fanfarillo, Alessandro: Sparse matrix-vector multiplication on GPGPUs (2017)
  4. Gao, Jiaquan; Wu, Kesong; Wang, Yushun; Qi, Panpan; He, Guixia: GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell’s equations (2017)
  5. Gao, Jiaquan; Qi, Panpan; He, Guixia: Efficient CSR-based sparse matrix-vector multiplication on GPU (2016)
  6. Rupp, Karl; Tillet, Philippe; Rudolf, Florian; Weinbub, Josef; Morhammer, Andreas; Grasser, Tibor; Jüngel, Ansgar; Selberherr, Siegfried: ViennaCL-linear algebra library for multi- and many-core architectures (2016)
  7. Mazhar, Hammad; Heyn, Toby; Negrut, Dan; Tasora, Alessandro: Using Nesterov’s method to accelerate multibody dynamics with friction and contact (2015)
  8. Müthing, Steffen; Ribbrock, Dirk; Göddeke, Dominik: Integrating multi-threading and accelerators into DUNE-ISTL (2015)
  9. Kreutzer, Moritz; Hager, Georg; Wellein, Gerhard; Fehske, Holger; Bishop, Alan R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units (2014)
  10. Rossi, R.; Mossaiby, F.; Idelsohn, S. R.: A portable OpenCL-based unstructured edge-based finite element Navier-Stokes solver on graphics hardware (2013)