Sunday, March 19, 2006

Build a Failsafe Cluster using HearBeat in Linux

Reason for having a Failsafe cluster:

A Failsafe clusters can dramatically cut downtime, and since service failovers are fast and automatic, system administrators get to enjoy their weekends and
users get to finish their work. "Sys Ads" and "users" are both happy, even pointy-haired managers are happy, because minimizing work stoppages saves money.

In a Critical environment like datacenter where there are 100's of domain
hosted in a single server and Imagine due to Hardware failure the server
goes down.

A Failsafe Cluster keeps eye on each other and if one goes down than the
other running server starts servicing the request of the failed server.


For example if you have a Web Server running with 202.41.2.10 and it
goes down then immediately another machine in the cluster environment
will be alerted to the fact and it will change its IP Address to the Primary
Server which is 202.41.2.10 and start acting like the Primary server
which is down.

Prerequisites before you build a Cluster environment.

In My Opinion,

- Will need two or more Identical Servers for Optimal Performance.

- If you are installing Redhat Enterprise 3.0 in Primary then install the same
in all other Secondary Servers and so on.

- Make sure you are running all the required Services that you have
configured in your Primary Servers in all other Secondary Cluster Servers.

- Two Network Cards in all the Servers that you want to bring in Cluster
environment.

If you have more than two servers for Clustering then use a Hub to
connect your eth0, Since Hub does not store mac address it will be easier
to switch over. Here, I am assuming eth1 is external World and eth0 is Internal.


In My example I have 2 Systems with Same Hardware, Configuration and
Services running on it.

There are softwares like cfengine and kickstart installation which will help
you build similar systems.

My Configuration : -

Mandrake 2006 Installed on Both Systems.

2 Nic's - eth0/eth1

Cross Over Cable CAT5 connected on eth0-Primary to eth0-Secondary.
This will be used for determining HeartBeat. This Will help to sense
when the primary goes down and brings the other cluster up.


Primary.sriram.com

External - eth1 - 192.168.0.1/255.255.255.0
Internal - eth0 - 10.33.16.1/255.255.255.0

Secondary.sriram.com

External - eth1 - 192.168.0.2/255.255.255.0
Internal - eth0 - 10.33.16.2/255.255.255.0

Here one is Master Node and other the Backup Node.

I have Fully Qualified Domain name :

Primary - primary.sriram.com

Secondary - secondary.sriram.com

Installing Hearbeat in Primary and Secondary Nodes :

For me in Mandrake 2006 I have URPMI Package Manager

#urpmi heartbeat

Its available here :

http://linux-ha.org/download


There are binary RPMs at the website, or you can build heartbeat from source. Grab the tarball (or install the source RPM). Untar it into your favorite source directory. From the top of the source tree, type

"./ConfigureMe configure", followed by "make" and "make install".

If you have problems installing the RPMs found at the website and want a way to make your own, there may be help in the FAQ

Configuring Clusters :

This involves modifying 3 Files in all the Cluster nodes Primary, Secondry etc :

/etc/ha.cf

/etc/haresources

/etc/authkeys

The first file ha.conf contains all the settings for your nodes.

You need to Modify the following tags in both machines.

node primary

node secondary

deadtime 30

warntime 10

bcast eth0

auto_failback on

Point : Instead of Mentioning node Primary , you can also mention Primary eth0
IP Address and same for node secondary , check if you are able to ping
primary and secondary i.e ..ping primary, ping secondary add it to /etc/hosts file.

If you have more than one backup node in the cluster than in ha.cf file in
each machine, you will enter the full list of machines in the cluster.

node primary

node secondary

node secondary1

node secondary2 and so on ...


Next open /etc/haresources , Provide the IP address as follows

primary.sriram.com 192.168.0.3 httpd smb

So, this line dictates that on startup , have Primary node serve the
IP 192.168.0.3 and start apache and samba as well

On shutdown , heartbeat will first stop smb then apache then give up the IP.

Now Comming to Final Part /etc/authkeys :

This will determine the authentication keys

There are three types of Authentication available for this ,

1) CRC------------------Level 1 Security (Simplest).
2) MD5-----------------Level 2 Security (Medium Level).
3) SHA1 ---------------Level 3 Security (Tough to crack).


The Format of the File is a Follows

For CRC sample might be,

auth 2
crc 2

Whatever index you put after the keyword auth must be found below in the keys listed in the file. If you put "auth 4", then there must be an "4 signaturetype" line in the list below.

Make sure its permissions are safe, like 600. And "any text you want" is not quite right. There's a limit to the number of characters you can use.

For sha1, a sample /etc/ha.d/authkeys could be:

auth 1
1 sha1 key-for-sha1-any-text-you-want

For md5, you could use the same as the above, but replace "sha1" with "md5".


Test Heartbeat

/etc/rc.d/init.d/hearbeat start on all nodes.

Shutdown the Primary server and see if the secondary server is up with the
Primary Servers IP address assigned to it.



That's it!