Created: 05/30/2007
Last Edited: 05/30/2007
Author: Dayid Alan <dayid@dayid.org>
Location: http://linux.dayid.org/doc/redundant-backup.html
There may be some better ways to do this. There are definitely "other" ways. This document is just to keep track of my own configuration and the workings of my servers. It most definitely can serve as a decent reference for duplicating my work, but please only follow the steps you understand.
While hardware configuration shouldn't really matter much with this simple exercise, I figured it is still worth mentioning. This setup that I am doing was initially done on two virtual machines running on OpenVZ. The host system is Noddy and the kernel at the time of this experiment is 2.6.18-8.e15.028stab031.1PAE on a CentOS 4.5 system. Just for testing purposes, one VE was made with CentOS 4, the other with Debian 4.0
Create both nodes (or setup both machines). Have three (3) IP addresses available to use, whether they are "internal" or "external" IP. In this situation I am working with VE200 (CentOS 4) and VE201 (Debian 4). Thus:
| Hostname: | ENV200 | ENV201 |
| System: | CentOS 4 | Debian 4 |
| Primary IP: | 192.168.0.200 | 192.168.0.201 |
| Redundant IP: | 192.168.0.205 | |
| Role: | Primary System | Redundant Backup |
So, what are we doing here?
GOAL: We are going to set up two machines with a monitoring program/script running on the backup machine, so that if the first machine goes down the second machine will automatically take over.
Process we expect to follow?
PROCESS: Hopefully, we will set up each server to have one unique IP address. Then we will have a shared IP address. All of our DNS will point to the shared IP address. The shared IP will remain on the first machine and through NAT will route all traffic to the first machine's unique IP. If machine two can no longer see machine one (meaning machine one is down) it will then assign itself the shared IP, which will be a NAT route to the second machine's unique IP. This way DNS does not have to re-propagate and the changeover should be seamless.
Required utilities?
UTILITIES: bash, ip, iptables, rsync, vi
Problems?
VE-vs-DEDICATED: Now, I am going to try to replicate this process as if it were two dedicated servers as best I can; that said, if you are going to do something like this using virtual environments on the same machine, you could simply change the NAT routing on that machine to forward OUTSIDE -> ENV1 to OUTSIDE -> ENV2 with a single command line to change the iptables rule.
Choices?
HOW-TO: So we can do this a few different ways:
Issues:
By using the first method above - we would be presuming that the backup server will never go down without us knowing first. As the goal of this redundant setup is to make it so that if any one (1) server fails the other is still available, in this case if the backup system failed we would be out of luck.
SOLUTION: Setup each server to monitor the other, and use the first method. So here is our normal setup:

...and when the primary server goes down:

...and the same is true when the backup server goes down:

The following is a simple script for doing the monitoring from the backup server's side:
#!/bin/bash
# pingtest.sh
if ping -c 1 192.168.0.200 >/dev/null;
then echo "Everything is okay."
else echo '========================='
echo "Host server is down @ `date`"
echo 'Stealing shared IP!'
echo '========================='
ifconfig venet0:1 192.168.0.205 netmask 255.255.255.0
fi
sleep 3
bash pingtest.sh
By using this and running both VEs (with ENV200 having IPs 192.168.0.200 & 192.168.0.205 and ENV201 having IPs 192.168.0.201 & 192.168.0.205) we can see this when we suddenly shut off ENV200:
ENV201-DEBIAN:/# bash pingtest.sh Everything is okay. Everything is okay. Everything is okay. Everything is okay. Everything is okay. Everything is okay. Everything is okay. Everything is okay. ==================== Host server is down @ Wed May 30 11:28:41 UTC 2007 Stealing shared IP! ==================== Everything is okay. Everything is okay. Everything is okay. Everything is okay....and at that point we can do ifconfig venet0:1 and see that it properly "stole" 192.168.0.205. So, this would work as a script, the only problem would be putting the shared IP back on the main server once you get it back up. For that you would still have to manually do an `# ifconfig venet0:1 down` on the backup system and then bring the IP up on the main system.
Now we have three scripts to run. Two for the backup server, and one for the main server. Hopefully you can understand my bash'ing enough to understand their purposes:
#!/bin/bash
# pingtest.sh (for backup server)
if ping -c 1 192.168.0.200 >/dev/null;
then echo "Everything is okay."
sleep 3
bash pingtest.sh
else echo '===================='
echo "Host server is down @ `date`"
echo 'Stealing shared IP!'
echo '===================='
ifconfig venet0:1 192.168.0.205 netmask 255.255.255.0
bash pingtest3.sh
fi
#!/bin/bash
# pingtest3.sh (for backup server)
if ping -c 1 192.168.0.200 >/dev/null;
then echo '========================='
echo "Host server is back up @ `date`"
echo "Releasing shared IP"
echo '========================='
ifconfig venet0:1 down
bash pingtest.sh
else echo "Host is still down"
sleep 3
bash pingtest3.sh
fi
sleep 3
#!/bin/bash
# pingtest2.sh (for main server)
if ping -c 1 192.168.0.205 >/dev/null;
then echo "Everything is okay."
else echo '===================='
echo "Backup server released IP @ `date`"
echo 'Stealing shared IP!'
echo '===================='
ifconfig venet0:1 192.168.0.205 netmask 255.255.255.0
fi
sleep 3
bash pingtest2.sh