Archive for the ‘devops’ Category
Monitoring Hadoop Clusters using Ganglia.
I spent a couple of days this week working with my Forward colleague Abs configuring Ganglia to monitor our Hadoop cluster and automating the installation to our production servers. The goal of this article is to provide an overview of the Ganglia architecture combined with our experience of getting it to play nicely with Hadoop.
Ganglia Overview
- Ganglia Monitoring Deamon (gmond) – The Ganglia Monitoring Deamon (gmond) needs to be installed on each machine that you want to monitor. In our case this included our slave and master Hadoop nodes. The gmond service collects server metrics and exposes them over TCP.
- Ganglia Meta Deamon (gmetad) – The meta Deamon polls all of the available gmond data sources (over TCP) and makes the data available for the web interace. We decided to use a dedicated server for the collection and presentation of the gathered metrics.
- Ganglia Web Application – Provides a PHP based web app that presents various visualisation around server performance over various time periods.

Installing gmond on your Hadoop servers.
We found the following installation guide, Installing ganglia-3.1.1 on Ubuntu 8.04 Hardy Heron, helpful when installing gmond on our Hadoop servers.
cluster {
name = "hadoop"
owner = "your company"
latlong = "unspecified"
url = "unspecified"
}
/* Specifies the port that gmond will receive data on */
udp_recv_channel {
port = 8649
}
/* Specifies the port and host that this gmond service will send data to. Our gmond services post to themselves rather than gmond services on other machines */
udp_send_channel {
host = your.hadoop.host.name
port = 8649
ttl = 1
}
/* Specifies the port that metrics can be retrieved from */
tcp_accept_channel {
port = 8650
}
Configuring Hadoop to send metrics to gmond
Ganglia Monitoring Server
data_source "master" master.hadoop:8650 data_source "slave1" slave1.hadoop:8650 data_source "slave2" slave2.hadoop:8650 data_source "slave3" slave3.hadoop:8650 data_source "slave4" slave4.hadoop:8650 data_source "slave5" slave5.hadoop:8650
Exposed Application Configuration
Problem:
As a developer or operations person
I want easy access to the current configuration of a web application
So that I can diagnose configuration problems more effectively
Solution:
Expose application properties as a simple HTML page. Using a URI such as: /internal/status allows the page to be hidden from end users through appropriate configuration of your web server. For example:

In this example status page, each configurable property is listed alongside the configured value. The page even provides the location of the properties file should modifications need to be made.
Teams can go one step further and expose “health checks” through such a page. In this example the application has three
dependencies that need to be satisfied for correct operation:
1) Need to be able to access a HTTP endpoint;
2) Need a directory to exist (and have read/write permissions)
3) Need to be able to connect to a database.
For each of these properties we can check whether the dependency is satisfied. For example, does the directory exist?
Can we read from the directory? Any failure can then be exposed visually, providing early warning signs immediately
after a deployment that the application is not healthy and requires further investigation.
For more information on this topic and many other techniques for smoothing the path from dev to production I highly recommend Sam Newman’s QCon 2010 presentation: From Development to Production