GPUTeraSort: high performance graphics co-processor sorting for large database management. We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces on the same PC -- a fast and dedicated memory interface on the GPU along with the main memory interface for CPU computations. As a result, we achieve higher memory bandwidth as compared to CPU-based algorithms running on commodity PCs. Our approach takes into account the limited communication bandwidth between the CPU and the GPU, and reduces the data communication between the two processors. Our algorithm also improves the performance of disk transfers and achieves close to peak I/O performance. We have tested the performance of our algorithm on the SortBenchmark and applied it to large databases composed of a few hundred Gigabytes of data. Our results on a 3 GHz Pentium IV PC with $300 NVIDIA 7800 GT GPU indicate a significant performance improvement over optimized CPU-based algorithms on high-end PCs with 3.6 GHz Dual Xeon processors. Our implementation is able to outperform the current high-end PennySort benchmark and results in a higher performance to price ratio. Overall, our results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.$

This software is also peer reviewed by journal TOMS.

References in zbMATH (referenced in 13 articles )

Showing results 1 to 13 of 13.
Sorted by year (citations)

  1. Beliakov, G.; Johnstone, M.; Nahavandi, S.: Computing of high breakdown regression estimators without sorting on graphics processing units (2012)
  2. Breß, Sebastian; Geist, Ingolf; Schallehn, Eike; Mory, Maik; Saake, Gunter: A framework for cost based optimization of hybrid CPU/GPU query plans in database systems (2012)
  3. Campobello, Giuseppe; Patanè, Giuseppe; Russo, Marco: On the complexity of min-max sorting networks (2012)
  4. Hong, Chun-Tao; Chen, De-Hao; Chen, Yu-Bei; Chen, Wen-Guang; Zheng, Wei-Min; Lin, Hai-Bo: Providing source code level portability between CPU and GPU with MapCG (2012) ioport
  5. Pratas, Frederico; Trancoso, Pedro; Sousa, Leonel; Stamatakis, Alexandros; Shi, Guochun; Kindratenko, Volodymyr: Fine-grain parallelism using multi-core, cell/BE, and GPU systems (2012) ioport
  6. Andrzejewski, Witold; Wrembel, Robert: GPU-PLWAH: GPU-based implementation of the PLWAH algorithm for compressing bitmaps (2011)
  7. Bertasi, Paolo; Bressan, Marco; Peserico, Enoch: PSort, yet another fast stable sorting software (2011) ioport
  8. Edelkamp, Stefan; Sulewski, Damian: External memory breadth-first search with delayed duplicate detection on the GPU (2011) ioport
  9. Man, Duhu; Ito, Yasuaki; Nakano, Koji: An efficient parallel sorting compatible with the standard qsort (2011)
  10. Hjelmervik, Jon Mikkelsen; Léon, Jean-Claude: Simplification of FEM-models on Cell BE (2010)
  11. Cederman, Daniel; Tsigas, Philippas: GPU-quicksort, a practical quicksort algorithm for graphics processors (2009)
  12. Gedik, Buğra; Bordawekar, Rajesh R.; Yu, Philip S.: CellJoin: a parallel stream join operator for the cell processor (2009) ioport
  13. Sintorn, Erik; Assarsson, Ulf: Fast parallel GPU-sorting using a hybrid algorithm (2008)