• Sailfish

  • Referenced in 10 articles [sw16828]
  • method (LBM) on modern Graphics Processing Units (GPUs) using CUDA/OpenCL. We take a novel approach ... principles of the code, scaling to multiple GPUs in a distributed environment, as well...
  • GPUTeraSort

  • Referenced in 14 articles [sw12706]
  • novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions...
  • LibBi

  • Referenced in 14 articles [sw19384]
  • units (CPUs), many-core graphics processing units (GPUs) and distributed-memory clusters of such devices...
  • OmpSs

  • Referenced in 13 articles [sw24813]
  • support asynchronous parallelism and heterogeneity (devices like GPUs). However, it can also be understood...
  • OP2

  • Referenced in 7 articles [sw17501]
  • applications on a distributed memory cluster of GPUs. We discuss the main design issues ... code for execution on a cluster of GPUs. A representative CFD application written using ... Sandy Bridge) and AMD (Magny-Cours), GPUs from NVIDIA (GTX560Ti, Tesla C2070), a distributed memory ... distributed memory GPU cluster (Tesla C2050 GPUs with InfiniBand). OP2’s design choices are explored...
  • ACEMD

  • Referenced in 7 articles [sw21783]
  • intrinsic parallelism of recent graphical processing units (GPUs) can offer a technological edge for molecular ... AMBER force fields. Designed specifically for GPUs it is able to achieve supercomputing scale performance ... single workstation computer equipped with just 3 GPUs. We believe that microsecond time scale molecular...
  • Vc

  • Referenced in 8 articles [sw21533]
  • explicit vectorization. Recent generations of CPUs, and GPUs in particular, require data-parallel codes ... applied to different input data. CPUs and GPUs can thus reduce the necessary hardware...
  • CULA

  • Referenced in 11 articles [sw12745]
  • execution model featured by NVIDIA GPUs based on CUDA demands very strong parallelism, requiring between...
  • cuDNN

  • Referenced in 11 articles [sw17848]
  • learning workloads. Our implementation contains routines for GPUs, although similarly to the BLAS library, these...
  • MADE

  • Referenced in 11 articles [sw36209]
  • deep ones. Vectorized implementations, such as on GPUs, are simple and fast. Experiments demonstrate that...
  • Hom4PS-3

  • Referenced in 10 articles [sw08783]
  • core systems, computer clusters, distributed environments, and GPUs with great efficiency and scalability. Designed...
  • GAMER

  • Referenced in 10 articles [sw10937]
  • taking advantage of the extraordinary performance of GPUs, up to two orders of magnitude performance...
  • GPUVerify

  • Referenced in 10 articles [sw11260]
  • programs that run on graphics processing units (GPUs). We provide a novel lock-step execution...
  • clSpMV

  • Referenced in 10 articles [sw12638]
  • cross-platform OpenCL SpMV framework on GPUs. Sparse matrix vector multiplication (SpMV) kernel...
  • TheLMA

  • Referenced in 10 articles [sw12960]
  • hardware, our solver may therefore run eight GPUs in parallel, which allows us to perform...
  • CNTK

  • Referenced in 10 articles [sw21056]
  • with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under...
  • kdtree++

  • Referenced in 10 articles [sw28526]
  • many-core architectures. The graphics processing units (GPUs) are gaining more and more popularity...
  • STOCHSIMGPU

  • Referenced in 6 articles [sw10711]
  • systems in parallel on graphics processing units (GPUs). STOCHSIMGPU is tightly integrated into the Systems ... software tool STOCHSIMGPU which exploits GPUs for parallel stochastic simulations of biological/chemical reaction systems...
  • FE-gMG

  • Referenced in 8 articles [sw10365]
  • complete FEM-based simulation toolkit on GPUs: unstructured grid finite element geometric multigrid solvers with...