CULA: Hybrid GPU accelerated linear algebra routines. The modern graphics processing unit (GPU) found in many standard personal computers is a highly parallel math processor capable of nearly 1 TFLOPS peak throughput at a cost similar to a high-end CPU and an excellent FLOPS/watt ratio. High-level linear algebra operations are computationally intense, often requiring O(N3) operations and would seem a natural fit for the processing power of the GPU. Our work is on CULA, a GPU accelerated implementation of linear algebra routines. We present results from factorizations such as LU decomposition, singular value decomposition and QR decomposition along with applications like system solution and least squares. The GPU execution model featured by NVIDIA GPUs based on CUDA demands very strong parallelism, requiring between hundreds and thousands of simultaneous operations to achieve high performance. Some constructs from linear algebra map extremely well to the GPU and others map poorly. CPUs, on the other hand, do well at smaller order parallelism and perform acceptably during low-parallelism code segments. Our work addresses this via hybrid a processing model, in which the CPU and GPU work simultaneously to produce results. In many cases, this is accomplished by allowing each platform to do the work it performs most naturally

References in zbMATH (referenced in 11 articles )

Showing results 1 to 11 of 11.
Sorted by year (citations)

  1. Fodor, Szabina; Németh, Zoltán: Numerical analysis of parallel implementation of the reorthogonalized ABS methods (2019)
  2. Wu, Rongteng; Xie, Xiaohong: A heterogeneous parallel LU factorization algorithm based on a basic column block uniform allocation strategy (2019)
  3. Piccinini, Enrico; Benedetti, Claudia; Siloi, Ilaria; Paris, Matteo G. A.; Bordone, Paolo: GPU-accelerated algorithms for many-particle continuous-time quantum walks (2017)
  4. Torky, Ahmed A.; Rashed, Youssef F.: GPU acceleration of the boundary element method for shear-deformable bending of plates (2017)
  5. Abdelfattah, Ahmad; Keyes, David; Ltaief, Hatem: KBLAS: an optimized library for dense matrix-vector multiplication on GPU accelerators (2016)
  6. Benoît Liquet and Leonardo Bottolo and Gianluca Campanella and Sylvia Richardson and Marc Chadeau-Hyam: R2GUESS: A Graphics Processing Unit-Based R Package for Bayesian Variable Selection Regression of Multivariate Responses (2016) not zbMATH
  7. D’Azevedo, Eduardo; Hu, Zhiang; Su, Shi-Quan; Wong, Kwai: Solving a large scale radiosity problem on GPU-based parallel computers (2014)
  8. Niemeyer, Kyle E.; Sung, Chih-Jen: Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs (2014)
  9. Georgescu, Serban; Chow, Peter; Okuda, Hiroshi: GPU acceleration for FEM-based structural analysis (2013)
  10. Wang, Lu; Hu, Xiaozhe; Cohen, Jonathan; Xu, Jinchao: A parallel auxiliary grid algebraic multigrid method for graphic processing units (2013)
  11. Kiss, Imre; Pávó, József; Gyimóthy, Szabolcs: Acceleration of moment method using CUDA (2011)