Configuring Rack Awareness in Hadoop

We are aware of the fact that hadoop divides the data into multiple file blocks and stores them on different machines. If Rack Awareness is not configured, there may be a possibility that hadoop will place all the copies of the block in same rack which results in loss of data when that rack fails.

Although rare, as rack failure is not as frequent as node failure, this can be avoided by explicitly configuring the Rack Awareness in conf-site.xml.

Rack awareness is configured using the property “” in the core-site.xml.

If “” is not configured, /default-rack is passed for any ip address i.e., all nodes are placed on same rack.

Configuring Rack awareness in hadoop involves two steps,

  1. configure the “” in core-site.xml ,





  2. Implement the scripts as desired, Sample rack-awareness scripts can be found here,

Sample 1: Script with datafile

Topology Script

A sample Bash shell script:


while [ $# -gt 0 ] ; do
  exec< ${HADOOP_CONF}/
  while read line ; do
    ar=( $line )
    if [ “${ar[0]}” = “$nodeArg” ] ; then
  if [ -z “$result” ] ; then
    echo -n “/default/rack “
    echo -n “$result “

Topology data     /dc1/rack1
hadoopdata1            /dc1/rack1               /dc1/rack2



Leave a comment

Filed under Hadoop, Hadoop Administration, Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s