Loo.py
Array Program Transformation with Loo.py by Example: High-Order Finite Elements. To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance computational kernel suitable for execution on modern GPU hardware. Along the transformation path, we encounter kernel fusion, vectorization, prefetch- ing, parallelization, and algorithmic changes achieved by mechanized conversion between imperative and functional/substitution- based code, among a number more. We conclude with performance results that demonstrate the effects and support the effectiveness of the applied transformations.
Keywords for this software
References in zbMATH (referenced in 3 articles )
Showing results 1 to 3 of 3.
Sorted by year (- Kempf, Dominic; Heß, René; Müthing, Steffen; Bastian, Peter: Automatic code generation for high-performance discontinuous Galerkin methods on modern architectures (2021)
- Luporini, Fabio; Louboutin, Mathias; Lange, Michael; Kukreja, Navjot; Witte, Philipp; Hückelheim, Jan; Yount, Charles; Kelly, Paul H. J.; Herrmann, Felix J.; Gorman, Gerard J.: Architecture and performance of Devito, a system for automated stencil computation (2020)
- Svensson, Bo Joel; Newton, Ryan R.; Sheeran, Mary: A language for hierarchical data parallel design-space exploration on GPUs (2016)