PARMON

PARMON: a portable and scalable monitoring system for clusters. Workstation/PC clusters have become a cost-effective solution for high performance computing. C-DAC’s PARAM 10000 (or OpenFrame, internal code name) is a large cluster of high-performance workstations interconnected through low-latency and high bandwidth networks. The management and control of such a huge system is a tedious and challenging task since workstations/PCs are typically designed to work as a standalone system rather than part of a cluster. We have designed and developed a tool called PARMON that allows effective monitoring and control of large clusters. It supports the monitoring of critical system resource activities and their utilization at three different levels: entire system, node and component level. It also allows the monitoring of multiple instances of the same component; for instance, multiple processors in SMP type cluster nodes. PARMON is a portable, flexible, interactive, scalable, location-transparent, and comprehensive environment based on client-server technology. The major components of PARMON are parmon-server – system resource activities and utilization information provider and parmon-client – a GUI based client responsible for interacting with parmon-server and users for data gathering in real-time and presenting information graphically for visualization. The client is developed as a Java application and the server is developed as a multithreaded server using C and POSIX/Solaris threads since Java does not support interfaces to access system internals. PARMON is regularly used to monitor PARAM 10000 supercomputer, a cluster of 48+ Ultra-4 workstations powered by the Solaris operating system. The recent popularity of Beowulf-class clusters (dedicated Linux clusters) in terms of price-performance ratio has motivated us to port PARMON to Linux (accomplished by porting system dependent portions of parmon-server). This enables management/monitoring of both Solaris and Linux-based clusters (federated clusters) through a single user interface.