The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA®CUDA™ runtime. It allows the user to access the computational resources of NVIDIA Graphics Processing Unit (GPU), but does not auto-parallelize across multiple GPUs. To use the CUBLAS library, the application must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the sequence of desired CUBLAS functions, and then upload the results from the GPU memory space back to the host. The CUBLAS library also provides helper functions for writing and retrieving data from the GPU.

References in zbMATH (referenced in 82 articles )

Showing results 1 to 20 of 82.
Sorted by year (citations)

1 2 3 4 5 next

  1. Abdelfattah, Ahmad; Costa, Timothy; Dongarra, Jack; Gates, Mark; Haidar, Azzam; Hammarling, Sven; Higham, Nicholas J.; Kurzak, Jakub; Luszczek, Piotr; Tomov, Stanimire; Zounon, Mawussi: A set of batched basic linear algebra subprograms and LAPACK routines (2021)
  2. Bosner, Nela: Parallel Prony’s method with multivariate matrix pencil approach and its numerical aspects (2021)
  3. Bosner, Nela: Parallel reduction of four matrices to condensed form for a generalized matrix eigenvalue algorithm (2021)
  4. Dong, W.; Kang, B.: Evaluation of gas sales agreements with indexation using tree and least-squares Monte Carlo methods on graphics processing units (2021)
  5. Świrydowicz, Katarzyna; Langou, Julien; Ananthan, Shreyas; Yang, Ulrike; Thomas, Stephen: Low synchronization Gram-Schmidt and generalized minimal residual algorithms. (2021)
  6. Ahrens, Peter; Demmel, James; Nguyen, Hong Diep: Algorithms for efficient reproducible floating point summation (2020)
  7. Bartelt, M.; Klöckner, O.; Dietzsch, J.; Groß, M.: Higher order finite elements in space and time for anisotropic simulations with variational integrators. Application of an efficient GPU implementation (2020)
  8. Fabien, Maurice S.: A GPU-accelerated hybridizable discontinuous Galerkin method for linear elasticity (2020)
  9. Huang, Jianyu; Yu, Chenhan D.; van de Geijn, Robert A.: Strassen’s algorithm reloaded on GPUs (2020)
  10. Kang, Homin; Kwon, Hyuck Chan; Kim, Duksu: HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs (2020)
  11. Seyoon Ko, Hua Zhou, Jin Zhou, Joong-Ho Won: DistStat.jl: Towards Unified Programming for High-Performance Statistical Computing Environments in Julia (2020) arXiv
  12. Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Wang, Lei Li: LightSeq: A High Performance Inference Library for Sequence Processing and Generation (2020) arXiv
  13. Bernaschi, Massimo; Carrozzo, Mauro; Franceschini, Andrea; Janna, Carlo: A dynamic pattern factored sparse approximate inverse preconditioner on graphics processing units (2019)
  14. Berrone, S.; D’Auria, A.; Vicini, F.: Fast and robust flow simulations in discrete fracture networks with gpgpus (2019)
  15. Cheng, Xuan; Zeng, Ming; Lin, Jinpeng; Wu, Zizhao; Liu, Xinguo: Efficient (L_0) resampling of point sets (2019)
  16. Chopp, D. L.: Introduction to high performance scientific computing (2019)
  17. Defez, Emilio; Ibáñez, Javier; Peinado, Jesús; Sastre, Jorge; Alonso-Jordá, Pedro: An efficient and accurate algorithm for computing the matrix cosine based on new Hermite approximations (2019)
  18. Du, Cheng-Han; Chiou, Yih-Peng; Wang, Weichung: Compressed hierarchical Schur algorithm for frequency-domain analysis of photonic structures (2019)
  19. Flegar, Goran; Scheidegger, Florian; Novaković, Vedran; Mariani, Giovani; Tomás, Andrés E.; Malossi, A. Cristiano I.; Quintana-Ortí, Enrique S.: FloatX: A C++ library for customized floating-point arithmetic (2019)
  20. Jaber J. Hasbestan, Inanc Senocak: PittPack: An Open-Source Poisson’s Equation Solver for Extreme-Scale Computing with Accelerators (2019) arXiv

1 2 3 4 5 next