HPC environment management: new challenges in the petaflop era High Performance Computing (HPC) is becoming much more popular nowadays. Currently, the biggest supercomputers in the world have hundreds of thousands of processors and consequently may have more software and hardware failures. HPC centers managers also have to deal with multiple clusters from different vendors with their particular architectures. However, since there are not enough HPC experts to manage all the new supercomputers, it is expected that non-experts will be managing those large clusters. In this paper we study the new challenges to manage HPC environments containing different clusters with different sizes and architectures. We review available tools and present LEMMing , an easy-to-use open source tool developed to support high performance computing centers. LEMMing integrates machine resources and the available management and monitoring tools on a single point of management.
References in zbMATH (referenced in 1 article , 1 standard article )
Showing result 1 of 1.