PARMON: a portable and scalable monitoring system for clusters. Workstation/PC clusters have become a cost-effective solution for high performance computing. C-DAC’s PARAM 10000 (or OpenFrame, internal code name) is a large cluster of high-performance workstations interconnected through low-latency and high bandwidth networks. The management and control of such a huge system is a tedious and challenging task since workstations/PCs are typically designed to work as a standalone system rather than part of a cluster. We have designed and developed a tool called PARMON that allows effective monitoring and control of large clusters. It supports the monitoring of critical system resource activities and their utilization at three different levels: entire system, node and component level. It also allows the monitoring of multiple instances of the same component; for instance, multiple processors in SMP type cluster nodes. PARMON is a portable, flexible, interactive, scalable, location-transparent, and comprehensive environment based on client-server technology. The major components of PARMON are parmon-server – system resource activities and utilization information provider and parmon-client – a GUI based client responsible for interacting with parmon-server and users for data gathering in real-time and presenting information graphically for visualization. The client is developed as a Java application and the server is developed as a multithreaded server using C and POSIX/Solaris threads since Java does not support interfaces to access system internals. PARMON is regularly used to monitor PARAM 10000 supercomputer, a cluster of 48+ Ultra-4 workstations powered by the Solaris operating system. The recent popularity of Beowulf-class clusters (dedicated Linux clusters) in terms of price-performance ratio has motivated us to port PARMON to Linux (accomplished by porting system dependent portions of parmon-server). This enables management/monitoring of both Solaris and Linux-based clusters (federated clusters) through a single user interface.
Keywords for this software
References in zbMATH (referenced in 6 articles , 1 standard article )
Showing results 1 to 6 of 6.
- Subramaniyan, Rajagopal; Raman, Pirabhu; George, Alan D.; Radlinski, Matthew: GEMS: Gossip-enabled monitoring service for scalable heterogeneous distributed systems (2006) ioport
- Agarwala, Sandip; Poellabauer, Christian; Kong, Jiantao; Schwan, Karsten; Wolf, Matthew: System-level resource monitoring in high-performance computing environments (2003)
- Bellavista, Paolo; Corradi, Antonio; Stefanelli, Cesare: Java for on-line distributed monitoring of heterogeneous systems and services (2002)
- Giné, Francesc; Solsona, Francesc; Navarro, Xavi; Hernández, Porfidio; Luque, Emilio: MemTo: A memory monitoring tool for a Linux cluster (2001)
- Uthayopas, Putchong; Phatanapherom, Sugree: Fast and scalable real-time monitoring system for Beowulf clusters (2001)
- Buyya, Rajkumar: PARMON: a portable and scalable monitoring system for clusters (2000)