Previous Section Table of Contents Next Section

10.2 Ganglia

With a large cluster, it can be a daunting task just to ensure that every machine is up and running every day if you try to do it manually. Fortunately, there are several tools that you can use to monitor the state of your cluster. In clustering circles, the better known of these include Ganglia, Clumon, and Performance Co-Pilot (CPC). While this section will describe Ganglia, you might reasonably consider any of these.

Ganglia is a real-time performance monitor for clusters and grids. If you are familiar with MRTG, Ganglia uses the same round-robin database package that was developed for MRTG. Memory efficient and robust, Ganglia scales well and has been used with clusters with hundreds of machines. It is also straightforward to configure for use with multiple clusters so that a single management station can monitor all the nodes within multiple clusters. It was developed at UCB, is freely available (via a BSD license), and has been ported to a number of different architectures.

Ganglia uses a client-server model and is composed of four parts. The monitor daemon gmond needs to be installed on every machine in the cluster. The backend for data collection, the daemon gmetad, and the web interface frontend are installed on a single management station. (There is also a Python class for sorting and classifying data from large clusters.) Data are transmitted using XML and XDR via both TCP and multicasting.

In addition to these core components, there are two command-line tools. The cluster status tool gstat provides a way to query gmond, allowing you to create a status report for your cluster. The metric tool gmetric allows you to easily monitor additional host metrics in addition to Ganglia's predefined metrics. For instance, suppose you have a program (and interface) that measures a computer's temperature on each node. gmetric can be used to request that gmond run this program. By running the gmetric command under cron, you could track computer temperature over time.

Finally, Ganglia also provides an execution environment. gexec allows you to run commands across the cluster transparently and forward stdin, stdout, and stderr. This discussion will focus of the three core elements of Ganglia-gmond, gmetad, and the web frontend.

10.2.1 Installing and Using Ganglia

Ganglia can be installed by compiling the sources or using RPM packages. The installation of the software for the management station, i.e., the node that collects information from the other nodes and maintains the database, is somewhat more involved. With large clusters, you may want to use a machine as a dedicated monitor. For smaller clusters, you may be able to get by with your head node if it is reasonably equipped. We'll look at the installation of the management node first since it is more involved.

10.2.1.1 RRDTool

Before you begin, there are several prerequisites for installing Ganglia. First, your network and hosts must be multicast enabled. This typically isn't a problem with most Linux installations. Next, the management station or stations, i.e., the machine on which you'll install gmetad and the web frontend, will also need RRDtool and Perl and a PHP-enabled web server.[2] (Since you will install only gmond on your compute nodes, these do not require Apache or RRDtool.)

[2] It appears that only the include file and library from RRDtool is needed, but I have not verified this. Perl is required for RRDtool, not Ganglia.

RRDtool is a round-robin database. As you add information to the database, the oldest data is dropped from the database. This allows you to store data in a compact manner that will not expand endlessly over time. Sources can be downloaded from http://www.rrdtool.org/. To install it, you'll need to unpack it and run configure, make, and make install.

[root@fanny src]# gunzip rrdtool-1.0.48.tar.gz

[root@fanny src]# tar -vxf rrdtool-1.0.48.tar

...

[root@fanny src]# cd rrdtool-1.0.48

[root@fanny rrdtool-1.0.48]# ./configure

...

[root@fanny rrdtool-1.0.48]# make

[root@fanny rrdtool-1.0.48]# make install

...

You'll see a lot of output along the way. In this example, I've installed it under /usr/local/src. If you want to install it in a different directory, you can use the --prefix option to specify the directory when you run configure. It doesn't really matter where you put it, but when you build Ganglia you'll need to tell Ganglia where to find the RRDtool library and include files.

10.2.1.2 Apache and PHP

Next, check the configuration files for Apache to ensure the PHP module is loaded. For Red Hat 9.0, the primary configuration file is httpd.conf and is located in /etc/httpd/conf/. It, in turn, includes the configuration files in /etc/httpd/conf.d/, in particular php.conf. What you are looking for is a configuration command that loads the PHP module somewhere in one of the Apache configuration files. That is, one of the configuration files should have some lines like the following:

LoadModule php4_module modules/libphp4.so

...

<Files *.php>

    SetOutputFilter PHP

    SetInputFilter PHP

    LimitRequestBody 524288

</Files>

If you used the package system to set up Apache and PHP, this should have been done for you. Finally, make sure Apache is running.

10.2.1.3 Ganglia monitor core

Next, you'll need to download the appropriate software. Go to http://ganglia.sourceforge.net/. You'll have a number of choices, including both source files and RPM files, for both Ganglia and related software. The Ganglia monitor core contains both gmond and gmetad (although by default it doesn't install gmetad). Here is an example of using the monitor core download to install from source files. First, unpack the software.

[root@fanny src]# gunzip ganglia-monitor-core-2.5.6.tar.gz

[root@fanny src]# tar -xvf ganglia-monitor-core-2.5.6.tar

...

As always, once you have unpacked the software, be sure to read the README file.

Next, change to the installation directory and build the software.

[root@fanny src]# cd ganglia-monitor-core-2.5.6

[root@fanny ganglia-monitor-core-2.5.6]# ./configure \

> CFLAGS="-I/usr/local/rrdtool-1.0.48/include" \

> CPPFLAGS="-I/usr/local/rrdtool-1.0.48/include" \

> LDFLAGS="-L/usr/local/rrdtool-1.0.48/lib" --with-gmetad

...

[root@fanny ganglia-monitor-core-2.5.6]# make

...

[root@fanny ganglia-monitor-core-2.5.6]# make install

...

As you can see, this is a pretty standard install with a couple of small exceptions. First, you'll need to tell configure where to find the RRDtool to include file and library by setting the various flags as shown above. Second, you'll need to explicitly tell configure to build gmetad. This is done with the --with-gmetad option.

Once you've built the software, you'll need to install and configure it. Both gmond and gmetad have very simple configuration files. The samples files gmond/gmond.conf and gmetad/gmetad.conf are included as part of the source tree. You should copy these to /etc and edit them before you start either program. The sample files are well documented and straightforward to edit. Most defaults are reasonable. Strictly speaking, the gmond.conf file is not necessary if you are happy with the defaults. However, you will probably want to update the cluster information at a minimum. The gmetad.conf file must be present and you'll need to identify at least one data source. You may also want to change the identity information in it.

For gmetad.conf, the data source entry is a list of the machines that will be monitored. The format is the identifier data_source followed by a unique string identifying the cluster. Next is an optional polling interval. Finally, there is a list of machines and optional port numbers. Here is a simple example:

data_source "my cluster" 10.0.32.144 10.0.32.145 10.0.32.146 10.0.32.147

The default sampling interval is 15 seconds and the default port is 8649.

Once you have the configuration files in place and edited to your satisfaction, copy the initialization files and start the programs. For gmond, it will look something like this:

[root@fanny ganglia-monitor-core-2.5.6]# cp ./gmond/gmond.init \

> /etc/rc.d/init.d/gmond

[root@fanny ganglia-monitor-core-2.5.6]# chkconfig --add gmond

[root@fanny ganglia-monitor-core-2.5.6]# /etc/rc.d/init.d/gmond start

Starting GANGLIA gmond:                                    [  OK  ]

As shown, you'll want to ensure that gmond is started whenever you reboot.

Before you start gmetad, you'll want to create a directory for the database.

[root@fanny ganglia-monitor-core-2.5.6]# mkdir -p /var/lib/ganglia/rrds

[root@fanny ganglia-monitor-core-2.5.6]# chown -R nobody \

> /var/lib/ganglia/rrds

Next, copy over the initialization file and start the program.

[root@fanny ganglia-monitor-core-2.5.6]# cp ./gmetad/gmetad.init \

> /etc/rc.d/init.d/gmetad

[root@fanny ganglia-monitor-core-2.5.6]# chkconfig --add gmetad

[root@fanny ganglia-monitor-core-2.5.6]# /etc/rc.d/init.d/gmetad start

Starting GANGLIA gmetad:                                   [  OK  ]

Both programs should now be running. You can verify this by trying to TELNET to their respective ports, 8649 for gmond and 8651 for gmetad. When you do this you should see a couple of messages followed by a fair amount of XML scroll by.

[root@fanny ganglia-monitor-core-2.5.6]# telnet localhost 8649

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>

<!DOCTYPE GANGLIA_XML [

   <!ELEMENT GANGLIA_XML (GRID)*>

...

If you see output such as this, everything is up and running. (Since you are going to the localhost, this should work even if your firewall is blocking TELNET.)

10.2.1.4 Web frontend

The final step in setting up the monitoring station is to install the frontend software. This is just a matter of downloading the appropriate file and unpacking it. Keep in mind that you must install this so that it is reachable as part of your website. Examine the DocumentRoot in your Apache configuration file and install the package under this directory. For example,

[root@fanny root]# grep DocumentRoot /etc/httpd/conf/httpd.conf

...

DocumentRoot "/var/www/html"

...

Now that you know where the document root is, copy the web frontend to this directory and unpack it.

[root@fanny root]# cp ganglia-webfrontend-2.5.5.tar.gz /var/www/html/

[root@fanny root]# cd /var/www/html

[root@fanny html]# gunzip ganglia-webfrontend-2.5.5.tar.gz

[root@fanny html]# tar -xvf ganglia-webfrontend-2.5.5.tar

There is nothing to build in this case. The configuration file is conf.php. Among other things, you can use this to change the appearance of your web site by changing the display themes.

At this point, you should be able to examine the state of this machine. (You'll still need to install gmond on the individual nodes before you can look at the rest of the cluster.) Start your web browser and visit your site, e.g., http://localhost/ganglia-webfrontend-2.5.5/. You should see something like Figure 10-1.

Figure 10-1. Ganglia on a single node
figs/hplc_1001.gif


This shows the host is up. Next, we need to install gmond on the individual nodes so we can see the rest of the cluster. You could use the same technique used above-just skip over the prerequisites and the gmetad steps. But it is much easier to use RPM. Just download the package to an appropriate location and install it. For example,

[root@george root]# rpm -vih ganglia-monitor-core-gmond-2.5.6-1.i386.rpm

Preparing...                ########################################### [100%]

   1:ganglia-monitor-core-gm########################################### [100%]

Starting GANGLIA gmond: [  OK  ]

gmond is installed in /usr/sbin and its configuration file in /etc. Once you've installed gmond on a machine, it should appear on your web page when you click on refresh. Repeat the installation for your remaining nodes.

Once you have Ganglia running, you may want to revisit the configuration files. With Ganglia running, it will be easier to see exactly what effect a change to a configuration file has. Of course, if you change a configuration file, you'll need to restart the appropriate services before you will see anything different.

You should have no difficulty figuring out how to use Ganglia. There are lots of "hot spots" on the pages, so just click and see what you get. The first page will tell you how many machines are up and down and their loads. You can select a physical view or collect information on individual machines. Figure 10-2 shows information for an individual machine. You can also change the metric displayed. However, not all metrics are supported. The Ganglia documentation supplies a list of supported metrics by architecture.

Figure 10-2. Ganglia Node View
figs/hplc_1002.gif


As you can see, these screen captures were made when the cluster was not otherwise in use. Otherwise the highlighted load figures would reflect that activity.

    Previous Section Table of Contents Next Section