hwloc: A generic framework for managing hardware affinities in HPC applications. The increasing numbers of cores, shared caches and memory nodes within machines introduces a complex hardware topology. High-performance computing applications now have to carefully adapt their placement and behavior according to the underlying hierarchy of hardware resources and their software affinities. We introduce the Hardware Locality (hwloc) software which gathers hardware information about processors, caches, memory nodes and more, and exposes it to applications and runtime systems in a abstracted and portable hierarchical manner. hwloc may significantly help performance by having runtime systems place their tasks or adapt their communication strategies depending on hardware affinities. We show that hwloc can already be used by popular high-performance OpenMP or MPI software. Indeed, scheduling OpenMP threads according to their affinities or placing MPI processes according to their communication patterns shows interesting performance improvement thanks to hwloc. An optimized MPI communication strategy may also be dynamically chosen according to the location of the communicating processes in the machine and its hardware characteristics.

References in zbMATH (referenced in 9 articles )

Showing results 1 to 9 of 9.
Sorted by year (citations)

  1. Duff, Iain; Hogg, Jonathan; Lopez, Florent: A new sparse (LDL^T) solver using a posteriori threshold pivoting (2020)
  2. Gratien, Jean-Marc: A robust and scalable multi-level domain decomposition preconditioner for multi-core architecture with large number of cores (2020)
  3. Liu, Weifeng; Zhou, Jie; Guo, Meng: Topology-aware strategy for MPI-IO operations in clusters (2018)
  4. Buttari, Alfredo: Fine-grained multithreading for the multifrontal (QR) factorization of sparse matrices (2013)
  5. Ito, Satoshi; Goto, Kazuya; Ono, Kenji: Automatically optimized core mapping to subdomains of domain decomposition method on multicore parallel environments (2013)
  6. Ma, Teng; Bosilca, George; Bouteiller, Aurelien; Dongarra, Jack J.: Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms (2013) ioport
  7. Bosilca, George; Bouteiller, Aurelien; Danalis, Anthony; Herault, Thomas; Lemarinier, Pierre; Dongarra, Jack: DAGuE: A generic distributed DAG engine for high performance computing (2012) ioport
  8. Sandrieser, Martin; Benkner, Siegfried; Pllana, Sabri: Using explicit platform descriptions to support programming of heterogeneous many-core systems (2012) ioport
  9. Broquedis, François; Furmento, Nathalie; Goglin, Brice; Wacrenier, Pierre-André; Namyst, Raymond: ForestGOMP: An efficient openMP environment for NUMA architectures (2010)