03.13.07
Protecting MySQL data in a RHCS cluster
One of my biggest and most successful clients is using Redhat Cluster Services to make their MySQL database highly available (using a 2-node Active/Passive configuration.) The idea behind a RHCS cluster is to remove all Single Points of Failure, thus, making a service or services highly available. Well, my client’s cluster had a melt down last summer because of one overlooked item.
A quick background blurb on clustering with Redhat. The idea, like I said, is to remove all SPoF by using as much redundancy as possible: multiple systems (nodes in cluster lingo), multiple power sources, multiple network connections, multiple SAN connections, etc. etc. etc. Even if a node completely dies, the cluster software will failover the service(s) to another node in the cluster. That’s the theory and it works rather well in practice.
So, how did my cluster melt down? Well, there’s this subsystem in Linux called the file system. It is responsible for managing the files and directories on some storage medium, like a hard drive. Well, with a cluster, you can’t put your data on a local drive because if that node fails, the other nodes can’t access the data - so you put the data on either a NAS (Network Attached Storage) or a SAN (Storage Area Network.) In this case, we used a SAN because you don’t want to run MySQL over NAS (well, you can but that’s a story for another day.)
Long story short, the file system on the SAN holding the MySQL data files got corrupted. Please note, this was not a fault with Redhat’s Cluster Services. It did what it was supposed to do: fail completely. Failing over the service is pointless because the data on the SAN will still be corrupted regardless of which node is active. Exactly how it got corrupted, I don’t know. I do know that it was an ext2 file system (I didn’t build this cluster, I inherited it) and it blew up in spectacular fashion. I had to rebuild the file system on the SAN (using ext3 this time) and restore the database from a backup that was about 10 hours old. Obviously, my client wasn’t very happy.
To protect against this type of failure in the future, we set up MySQL replication to another MySQL server, one that was not a part of the cluster nor attached to SAN. If we should have a total cluster failure, it would only take minutes to point their application to the replication server and get their sites back up and running. We could then fix the cluster without doing it hastily. In order to put the clustered database back into full production, we would have to halt the application, dump the data and import it back in to the production database, then fire the application back up - a process that would take approximately 20-30 minutes to complete (the dump+restore time is dependent on the database size, obviously.)
A second solution we considered would be to use a clustered file system (like GFS, Lustre or Veritas.) This does add another layer of complexity to the original cluster itself. The second cluster would require full redundancy of all parts, which means more money spent and physical resources to manage. Additionally, you have a cluster dependent upon a cluster, which makes maintenance not very fun at all.
We’ve been running with the cluster+replication scheme for over 6 months and it’s been very solid. We actually had to fail over to the replication database because we had to perform some lengthy maintenance on the SAN. Thanks to this arrangement, my client experienced less than 30 minutes of total downtime during a 4 hour maintenance window. They were happy, which makes me happy.
