OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. OP2 is an “active” library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into different parallel implementations for execution on different back-end hardware platforms. In this paper we present the design of the current OP2 library, and investigate its capabilities in achieving performance portability, near-optimal performance, and scaling on modern multi-core and many-core processor based systems. A key feature of this work is OP2’s recent extension facilitating the development and execution of applications on a distributed memory cluster of GPUs. We discuss the main design issues in parallelizing unstructured mesh based applications on heterogeneous platforms. These include handling data dependencies in accessing indirectly referenced data, the impact of unstructured mesh data layouts (array of structs vs. struct of arrays) and design considerations in generating code for execution on a cluster of GPUs. A representative CFD application written using the OP2 framework is utilized to provide a contrasting benchmarking and performance analysis study on a range of multi-core/many-core systems. These include multi-core CPUs from Intel (Westmere and Sandy Bridge) and AMD (Magny-Cours), GPUs from NVIDIA (GTX560Ti, Tesla C2070), a distributed memory CPU cluster (Cray XE6) and a distributed memory GPU cluster (Tesla C2050 GPUs with InfiniBand). OP2’s design choices are explored with quantitative insights into their contributions to performance. We demonstrate that an application written once at a high-level using the OP2 API can be easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.
Keywords for this software
References in zbMATH (referenced in 5 articles )
Showing results 1 to 5 of 5.
- Mohanamuraly, P.; Hascoët, L.; Müller, J.-D.: Seeding and adjoining zero-halo partitioned parallel scientific codes (2020)
- Reguly, István Z.; Mudalige, Gihan R.: Productivity, performance, and portability for computational fluid dynamics applications (2020)
- Luporini, Fabio; Lange, Michael; Jacobs, Christian T.; Gorman, Gerard J.; Ramanujam, J.; Kelly, Paul H. J.: Automated tiling of unstructured mesh computations with application to seismological modeling (2019)
- Guillas, Serge; Sarri, Andria; Day, Simon J.; Liu, Xiaoyu; Dias, Frederic: Functional emulation of high resolution tsunami modelling over cascadia (2018)
- Lange, Michael; Mitchell, Lawrence; Knepley, Matthew G.; Gorman, Gerard J.: Efficient mesh management in firedrake using PETSc DMPlex (2016)