Ryan Greenhall

Thoughts on Software Development

Monitoring Hadoop Clusters using Ganglia.

with 36 comments

I spent a couple of days this week working with my Forward colleague Abs configuring Ganglia to monitor our Hadoop cluster and automating the installation to our production servers. The goal of this article is to provide an overview of the Ganglia architecture combined with our experience of getting it to play nicely with Hadoop.

Ganglia Overview

Ganglia is comprised of three components:
  1. Ganglia Monitoring Deamon (gmond) – The Ganglia Monitoring Deamon (gmond) needs to be installed on each machine that you want to monitor.  In our case this included our slave and master Hadoop nodes. The gmond service collects server metrics and exposes them over TCP.
  2. Ganglia Meta Deamon (gmetad) – The meta Deamon polls all of the available gmond data sources (over TCP) and makes the data available for the web interace. We decided to use a dedicated server for the collection and presentation of the gathered metrics.
  3. Ganglia Web Application – Provides a PHP based web app that presents various visualisation around server performance over various time periods.

ganglia-hadoop-configuration

Installing gmond on your Hadoop servers.

We found the following installation guide, Installing ganglia-3.1.1 on Ubuntu 8.04 Hardy Heron, helpful when installing gmond on our Hadoop servers.

We placed the gmond configuration in the default location: /etc/ganglia/gmond.conf and made the following changed to the defaults.
cluster {
    name = "hadoop"
    owner = "your company"
    latlong = "unspecified"
    url = "unspecified"
}

/* Specifies the port that gmond will receive data on */
udp_recv_channel {
  port = 8649
}

/* Specifies the port and host that this gmond service will send data to. Our gmond services post to themselves rather than gmond services on other machines */
udp_send_channel {
    host = your.hadoop.host.name
    port = 8649
    ttl = 1
}

/* Specifies the port that metrics can be retrieved from */
tcp_accept_channel {
  port = 8650
}
Start gmond using sudo gmond.  To ensure that gmond is collecting stats correctly use: telnet localhost 8650.  This should output a stream of XML containing collected stats.

Configuring Hadoop to send metrics to gmond

Fortunately for us, Hadoop provides gmond monitoring integration through org.apache.hadoop.metrics.ganglia.GangliaContext31, which is configured in hadoop-metrics.properties.  A restart of the tasktracker is required for hadoop specific metrics to appear in the Ganglia web app.
/etc/init.d/hadoop-tasktracker restart

Ganglia Monitoring Server

We decided to install gmetad and the Ganglia web app on a standalone machine.  Once again we found Installing ganglia-3.1.1 on Ubuntu 8.04 Hardy Heron very helpful in installing these two components.  Once gmetad has been installed it needs to know which datasources to poll for metrics.  To do this we added the following entries into /etc/ganglia/gmetad.conf:
data_source "master" master.hadoop:8650
data_source "slave1" slave1.hadoop:8650
data_source "slave2" slave2.hadoop:8650
data_source "slave3" slave3.hadoop:8650
data_source "slave4" slave4.hadoop:8650
data_source "slave5" slave5.hadoop:8650
Finally, start gmetad to be see server metrics in the Ganglia web app (http://your.ganglia.host/ganglia).
sudo metad

Written by Ryan Greenhall

October 22nd, 2010 at 2:04 pm

Posted in devops