CUBLAS

The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA®CUDA™ runtime. It allows the user to access the computational resources of NVIDIA Graphics Processing Unit (GPU), but does not auto-parallelize across multiple GPUs. To use the CUBLAS library, the application must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the sequence of desired CUBLAS functions, and then upload the results from the GPU memory space back to the host. The CUBLAS library also provides helper functions for writing and retrieving data from the GPU.


References in zbMATH (referenced in 48 articles )

Showing results 1 to 20 of 48.
Sorted by year (citations)

1 2 3 next

  1. Bosner, Nela; Bujanović, Zvonimir; Drmač, Zlatko: Parallel solver for shifted systems in a hybrid CPU-GPU framework (2018)
  2. Defez, Emilio; Ibáñez, Javier; Sastre, Jorge; Peinado, Jesús; Alonso, Pedro: A new efficient and accurate spline algorithm for the matrix exponential computation (2018)
  3. Yang, Wangdong; Li, Kenli; Li, Keqin: A parallel computing method using blocked format with optimal partitioning for SpMV on GPU (2018)
  4. Alonso, Pedro; Ibáñez, Javier; Sastre, Jorge; Peinado, Jesús; Defez, Emilio: Efficient and accurate algorithms for computing matrix trigonometric functions (2017)
  5. Al-Refaie, Ahmed F.; Yurchenko, Sergei N.; Tennyson, Jonathan: GPU accelerated intensities MPI (GAIN-MPI): a new method of computing Einstein-$A$ coefficients (2017)
  6. Bosner, Nela; Karlsson, Lars: Parallel and heterogeneous $m$-Hessenberg-triangular-triangular reduction (2017)
  7. Cedric Nugteren: CLBlast: A Tuned OpenCL BLAS Library (2017) arXiv
  8. Chan, Jesse; Warburton, T.: GPU-accelerated Bernstein-Bézier discontinuous Galerkin methods for wave problems (2017)
  9. Gao, Jiaquan; Wu, Kesong; Wang, Yushun; Qi, Panpan; He, Guixia: GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell’s equations (2017)
  10. Abdelfattah, Ahmad; Keyes, David; Ltaief, Hatem: KBLAS: an optimized library for dense matrix-vector multiplication on GPU accelerators (2016)
  11. Anzt, Hartwig; Chow, Edmond; Saak, Jens; Dongarra, Jack: Updating incomplete factorization preconditioners for model order reduction (2016)
  12. Chen, Yuxin; Keyes, David; Law, Kody J. H.; Ltaief, Hatem: Accelerated dimension-independent adaptive metropolis (2016)
  13. Matthew Moskewicz, Forrest Iandola, Kurt Keutzer: Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms (2016) arXiv
  14. Christopher Paciorek; Benjamin Lipshitz; Wei Zhuo; Prabhat; Cari G. Kaufman; Rollin Thomas: Parallelizing Gaussian Process Calculations in R (2015)
  15. D’Amore, L.; Laccetti, G.; Romano, D.; Scotti, G.; Murli, A.: Towards a parallel component in a GPU-CUDA environment: a case study with the L-BFGS Harwell routine (2015)
  16. Magoulès, Frédéric; Ahamed, Abal-Kassim Cheik; Putanowicz, Roman: Auto-tuned Krylov methods on cluster of graphics processing unit (2015)
  17. Naumov, M.; Arsaev, M.; Castonguay, P.; Cohen, J.; Demouth, J.; Eaton, J.; Layton, S.; Markovskiy, N.; Reguly, I.; Sakharnykh, N.; Sellappan, V.; Strzodka, R.: AmgX: a library for GPU accelerated algebraic multigrid and preconditioned iterative methods (2015)
  18. Ploskas, Nikolaos; Samaras, Nikolaos: Efficient GPU-based implementations of simplex type algorithms (2015)
  19. Wong, Kwai; D’Azevedo, Eduardo; Hu, Zhiang; Kail, Andrew; Su, Shiquan: Solving a large-scale thermal radiation problem using an interoperable executive library framework on petascale supercomputers (2015)
  20. Birk, Matthias; Dapp, Robin; Ruiter, N. V.; Becker, J.: GPU-based iterative transmission reconstruction in 3D ultrasound computer tomography (2014) ioport

1 2 3 next