The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA®CUDA™ runtime. It allows the user to access the computational resources of NVIDIA Graphics Processing Unit (GPU), but does not auto-parallelize across multiple GPUs. To use the CUBLAS library, the application must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the sequence of desired CUBLAS functions, and then upload the results from the GPU memory space back to the host. The CUBLAS library also provides helper functions for writing and retrieving data from the GPU.

References in zbMATH (referenced in 37 articles )

Showing results 1 to 20 of 37.
Sorted by year (citations)

1 2 next

  1. Alonso, Pedro; Ibáñez, Javier; Sastre, Jorge; Peinado, Jesús; Defez, Emilio: Efficient and accurate algorithms for computing matrix trigonometric functions (2017)
  2. Bosner, Nela; Karlsson, Lars: Parallel and heterogeneous $m$-Hessenberg-triangular-triangular reduction (2017)
  3. Cedric Nugteren: CLBlast: A Tuned OpenCL BLAS Library (2017) arXiv
  4. Abdelfattah, Ahmad; Keyes, David; Ltaief, Hatem: KBLAS: an optimized library for dense matrix-vector multiplication on GPU accelerators (2016)
  5. Anzt, Hartwig; Chow, Edmond; Saak, Jens; Dongarra, Jack: Updating incomplete factorization preconditioners for model order reduction (2016)
  6. Chen, Yuxin; Keyes, David; Law, Kody J.H.; Ltaief, Hatem: Accelerated dimension-independent adaptive metropolis (2016)
  7. Matthew Moskewicz, Forrest Iandola, Kurt Keutzer: Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms (2016) arXiv
  8. Magoulès, Frédéric; Ahamed, Abal-Kassim Cheik; Putanowicz, Roman: Auto-tuned Krylov methods on cluster of graphics processing unit (2015)
  9. Naumov, M.; Arsaev, M.; Castonguay, P.; Cohen, J.; Demouth, J.; Eaton, J.; Layton, S.; Markovskiy, N.; Reguly, I.; Sakharnykh, N.; Sellappan, V.; Strzodka, R.: AmgX: a library for GPU accelerated algebraic multigrid and preconditioned iterative methods (2015)
  10. Ploskas, Nikolaos; Samaras, Nikolaos: Efficient GPU-based implementations of simplex type algorithms (2015)
  11. Wong, Kwai; D’Azevedo, Eduardo; Hu, Zhiang; Kail, Andrew; Su, Shiquan: Solving a large-scale thermal radiation problem using an interoperable executive library framework on petascale supercomputers (2015)
  12. Birk, Matthias; Dapp, Robin; Ruiter, N.V.; Becker, J.: GPU-based iterative transmission reconstruction in 3D ultrasound computer tomography (2014) ioport
  13. Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco: 3D data denoising via nonlocal means filter by using parallel GPU strategies (2014)
  14. Dastgeer, Usman; Li, Lu; Kessler, Christoph: The PEPPHER composition tool: performance-aware composition for GPU-based systems (2014) ioport
  15. Gao, Jiaquan; Liang, Ronghua; Wang, Jun: Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU (2014) ioport
  16. Lopez, M.Graham; Horton, Mitchel D.: Batch matrix exponentiation (2014)
  17. Abdelfattah, Ahmad; Keyes, David; Ltaief, Hatem: Systematic approach in optimizing numerical memory-bound kernels on GPU (2013) ioport
  18. Hogg, J.D.: A fast dense triangular solve in CUDA (2013)
  19. Knepley, Matthew G.; Terrel, Andy R.: Finite element integration on GPGPUs (2013)
  20. Li, Liu; Hou, Wenguang; Zhang, Xuming; Ding, Mingyue: GPU-based block-wise nonlocal means denoising for 3D ultrasound images (2013)

1 2 next