The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA®CUDA™ runtime. It allows the user to access the computational resources of NVIDIA Graphics Processing Unit (GPU), but does not auto-parallelize across multiple GPUs. To use the CUBLAS library, the application must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the sequence of desired CUBLAS functions, and then upload the results from the GPU memory space back to the host. The CUBLAS library also provides helper functions for writing and retrieving data from the GPU.

References in zbMATH (referenced in 79 articles )

Showing results 1 to 20 of 79.
Sorted by year (citations)

1 2 3 4 next

  1. Bosner, Nela: Parallel Prony’s method with multivariate matrix pencil approach and its numerical aspects (2021)
  2. Bosner, Nela: Parallel reduction of four matrices to condensed form for a generalized matrix eigenvalue algorithm (2021)
  3. Dong, W.; Kang, B.: Evaluation of gas sales agreements with indexation using tree and least-squares Monte Carlo methods on graphics processing units (2021)
  4. Ahrens, Peter; Demmel, James; Nguyen, Hong Diep: Algorithms for efficient reproducible floating point summation (2020)
  5. Bartelt, M.; Klöckner, O.; Dietzsch, J.; Groß, M.: Higher order finite elements in space and time for anisotropic simulations with variational integrators. Application of an efficient GPU implementation (2020)
  6. Huang, Jianyu; Yu, Chenhan D.; Geijn, Robert A. van de: Strassen’s algorithm reloaded on GPUs (2020)
  7. Kang, Homin; Kwon, Hyuck Chan; Kim, Duksu: HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs (2020)
  8. Seyoon Ko, Hua Zhou, Jin Zhou, Joong-Ho Won: DistStat.jl: Towards Unified Programming for High-Performance Statistical Computing Environments in Julia (2020) arXiv
  9. Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Wang, Lei Li: LightSeq: A High Performance Inference Library for Sequence Processing and Generation (2020) arXiv
  10. Bernaschi, Massimo; Carrozzo, Mauro; Franceschini, Andrea; Janna, Carlo: A dynamic pattern factored sparse approximate inverse preconditioner on graphics processing units (2019)
  11. Berrone, S.; D’Auria, A.; Vicini, F.: Fast and robust flow simulations in discrete fracture networks with gpgpus (2019)
  12. Cheng, Xuan; Zeng, Ming; Lin, Jinpeng; Wu, Zizhao; Liu, Xinguo: Efficient (L_0) resampling of point sets (2019)
  13. Chopp, D. L.: Introduction to high performance scientific computing (2019)
  14. Defez, Emilio; Ibáñez, Javier; Peinado, Jesús; Sastre, Jorge; Alonso-Jordá, Pedro: An efficient and accurate algorithm for computing the matrix cosine based on new Hermite approximations (2019)
  15. Du, Cheng-Han; Chiou, Yih-Peng; Wang, Weichung: Compressed hierarchical Schur algorithm for frequency-domain analysis of photonic structures (2019)
  16. Flegar, Goran; Scheidegger, Florian; Novaković, Vedran; Mariani, Giovani; Tomás, Andrés E.; Malossi, A. Cristiano I.; Quintana-Ortí, Enrique S.: FloatX: A C++ library for customized floating-point arithmetic (2019)
  17. Jaber J. Hasbestan, Inanc Senocak: PittPack: An Open-Source Poisson’s Equation Solver for Extreme-Scale Computing with Accelerators (2019) arXiv
  18. Li, Ruipeng; Xi, Yuanzhe; Erlandson, Lucas; Saad, Yousef: The eigenvalues slicing library (EVSL): algorithms, implementation, and software (2019)
  19. Sastre, Jorge; Ibáñez, Javier; Alonso-Jordá, Pedro; Peinado, Jesús; Defez, Emilio: Fast Taylor polynomial evaluation for the computation of the matrix cosine (2019)
  20. Tim Besard, Valentin Churavy, Alan Edelman, Bjorn De Sutter: Rapid software prototyping for heterogeneous and distributed platforms (2019) not zbMATH

1 2 3 4 next