QUDA: A library for QCD on GPUs. QUDA is a library for performing calculations in lattice QCD on graphics processing units (GPUs), leveraging NVIDIA’s CUDA platform. The current release includes optimized Dirac operators and solvers for the following fermion actions: Wilson; Clover-improved Wilson; Twisted mass (degenerate or non-degenerate); Twisted mass with a clover term; Staggered fermions; Improved staggered (asqtad or HISQ); Domain wall (4-d or 5-d preconditioned); Mobius fermions. Implementations of CG, multi-shift CG, BiCGstab, and DD-preconditioned GCR are provided, including robust mixed-precision variants supporting combinations of double, single, and half (16-bit ”block floating point”) precision. The library also includes auxiliary routines necessary for Hybrid Monte Carlo, such as HISQ link fattening, force terms and clover-field construction. Use of many GPUs in parallel is supported throughout, with communication handled by QMP or MPI. Several commonly-used packages integrate support for QUDA as a compile-time option, including Chroma, MILC, CPS, and BQCD (in a specific branch available here).