The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA®CUDA™ runtime. It allows the user to access the computational resources of NVIDIA Graphics Processing Unit (GPU), but does not auto-parallelize across multiple GPUs. To use the CUBLAS library, the application must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the sequence of desired CUBLAS functions, and then upload the results from the GPU memory space back to the host. The CUBLAS library also provides helper functions for writing and retrieving data from the GPU.

References in zbMATH (referenced in 32 articles )

Showing results 1 to 20 of 32.
Sorted by year (citations)

1 2 next

  1. Bosner, Nela; Karlsson, Lars: Parallel and heterogeneous $m$-Hessenberg-triangular-triangular reduction (2017)
  2. Anzt, Hartwig; Chow, Edmond; Saak, Jens; Dongarra, Jack: Updating incomplete factorization preconditioners for model order reduction (2016)
  3. Chen, Yuxin; Keyes, David; Law, Kody J.H.; Ltaief, Hatem: Accelerated dimension-independent adaptive metropolis (2016)
  4. Magoulès, Frédéric; Ahamed, Abal-Kassim Cheik; Putanowicz, Roman: Auto-tuned Krylov methods on cluster of graphics processing unit (2015)
  5. Naumov, M.; Arsaev, M.; Castonguay, P.; Cohen, J.; Demouth, J.; Eaton, J.; Layton, S.; Markovskiy, N.; Reguly, I.; Sakharnykh, N.; Sellappan, V.; Strzodka, R.: AmgX: a library for GPU accelerated algebraic multigrid and preconditioned iterative methods (2015)
  6. Ploskas, Nikolaos; Samaras, Nikolaos: Efficient GPU-based implementations of simplex type algorithms (2015)
  7. Wong, Kwai; D’Azevedo, Eduardo; Hu, Zhiang; Kail, Andrew; Su, Shiquan: Solving a large-scale thermal radiation problem using an interoperable executive library framework on petascale supercomputers (2015)
  8. Birk, Matthias; Dapp, Robin; Ruiter, N.V.; Becker, J.: GPU-based iterative transmission reconstruction in 3D ultrasound computer tomography (2014)
  9. Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco: 3D data denoising via nonlocal means filter by using parallel GPU strategies (2014)
  10. Dastgeer, Usman; Li, Lu; Kessler, Christoph: The PEPPHER composition tool: performance-aware composition for GPU-based systems (2014)
  11. Gao, Jiaquan; Liang, Ronghua; Wang, Jun: Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU (2014)
  12. Lopez, M.Graham; Horton, Mitchel D.: Batch matrix exponentiation (2014)
  13. Abdelfattah, Ahmad; Keyes, David; Ltaief, Hatem: Systematic approach in optimizing numerical memory-bound kernels on GPU (2013)
  14. Hogg, J.D.: A fast dense triangular solve in CUDA (2013)
  15. Knepley, Matthew G.; Terrel, Andy R.: Finite element integration on GPGPUs (2013)
  16. Li, Liu; Hou, Wenguang; Zhang, Xuming; Ding, Mingyue: GPU-based block-wise nonlocal means denoising for 3D ultrasound images (2013)
  17. Wang, Xin; Zhang, Bin; Cao, Xu; Liu, Fei; Luo, Jianwen; Bai, Jing: Acceleration of early-photon fluorescence molecular tomography with graphics processing units (2013)
  18. Bell, Nathan; Dalton, Steven; Olson, Luke N.: Exposing fine-grained parallelism in algebraic multigrid methods (2012)
  19. Galiano, V.; Migallón, H.; Migallón, V.; Penadés, J.: GPU-based parallel algorithms for sparse nonlinear systems (2012)
  20. Kramer, Stephan C.: CUDA-based scientific computing. Tools and selected applications (2012)

1 2 next