At $WORK we typically use Redhat Cluster to make MySQL highly available. We have a number of 2-node clusters (active/passive) and it works nicely, for the most part. One thing I find very annoying is RHCS has no notification framework to notify you when a service is relocated to another node (because of failure or otherwise.)
In a perfect world, the only reason MySQL would pass between nodes is for server maintenance, under the control of a human. In reality, crap happens. Redhat Cluster manages all this, but I still want to know that a failover happened (is hardware going bad? did a software bug cause a failure? etc.)
Most monitoring tools weren’t designed to report on a clustered service. I can add a check that connects to MySQL using the cluster-managed IP - but that only tells me if MySQL has failed completely. The failover between nodes generally takes under a minute so it’s easy for that type of monitoring check to miss a failover. Parsing syslog output is less than optimal because syslog is unstructured text. Blech. Process and disk checks can’t be used on each node because they’ll only exist on one node, the active node.
My current solution is a bit of a kludge but it works. We configure MySQL to use standard names for binary logs and error logs (they never change, regardless of which node the database is running on.) The only file name that isn’t standardized is the pid file, which defaults to the hostname of the server running MySQL. So, with that bit of info, I wrote a script that connects to MySQL using the cluster-managed IP and issues the query “show variables like ‘pid_file’” and parses the result. It then compares the hostname against a stored value in a file, which contains the hostname of the current active node. If the hostname changed, then MySQL has changed nodes (whether due to failure or manual relocation.) This changing of hostnames is considered an error condition and the monitoring system sends alerts appropriately. The NOC staff consult procedures to determine the level of failure and call people if necessary. Clearing the error requires editing the file and setting the current hostname as the active node.
What would be nice would be some informational functions in MySQL to give a bit more data about the server itself, things like hostname, IP address MySQL is bound to, port, etc. Then I could use “select hostname()” or similar instead of relying on a file name. Before someone says “you connected to the database so you know all that already,” not all of this information is known to a database connection, particularly if your application uses a database connection pool or the connection goes through a proxy, masking the real database.
You might find the following script useful as well. It checks each node to see if cluster filesystems/partitons are mounted on more than one node. Outputs error codes for monitoring.
http://themattreid.com/.files/scripts/mysql/check_cluster_filesystems
[...] Stoner, inhabitant of Stoner’s World, has an item about his use of monitoring MySQL and Red Hat Cluster. “In a perfect world, the only reason MySQL would pass between nodes is for server [...]