Monday, 22 January 2007

Handling MySQL replication failure gracefully

As any serious MySQL DBA knows, MySQL can and will now and again go out of sync, most of the time it cures itself, but sometimes the worst happens and a slave stops replicating. Yet I bet most apps keep on going with random outcomes, especially if your storing the users sessions in MySQL and have a master<->master setup.

To counter this problem I implemented a quick an easy check. If the check catches a replication failure `touch` a file to the nfs mounted tmp directory that is shared by all nodes.

Each node before executing anything (in index.php) checks for a file called 'halt' in tmp. If found exits and bingo no weirdness, corruption, or wondering why something is there one click, gone the next. It even emails you, and with a simple addition can sms.


mysql_query("show slave status;", CONNECTION);
if(mysql_errno(CONNECTION)!=0)
{
mail('team@<some url>',
'<project> REPLICATION ERROR!',
"Error show Slave Status: ".
mysql_errno(CONNECTION).'-'.mysql_error(CONNECTION));
$fp = fopen('tmp/halt', 'w+');
fclose($fp);
header('Location: index.php');
}


Don't forget to remove the 'halt' file after you have the replication under control.

No comments: