Select Page

Using Checksums to Ensure Table Consistency in MySQL

Mike Hamrick | | January 26, 2009

If you’re using MySQL replication, chances are your master and slave databases aren’t entirely consistent. There are a number of reasons for this. Some MySQL functions like UUID() don’t replicate properly with statement based replication. Poor network connectivity can sometimes result in corrupted relay logs. Misconfigured software can accidently write to the slave database. Statements on the master will sometimes complete partially due to MySQL shutdown or a statement getting killed or interrupted. These are just a few reasons why replicated databases are often inconsistent with the master, and there are probably a lot more. If you’re using a replicated MySQL slave for performing backups and you aren’t regularly verifying that your databases are consistent, you could be in for a nasty surprise when it comes time to restore from a backup.

This article explains how to perform these consistency checks both manually and with third party tools. Maatkit is a collection of MySQL tools from the good folks at Percona. The Maatkit tool we’ll be focusing on in the second half of this article is called mk-table-checksum. This tool allows you to compare any number of replicated databases for consistency.

A checksum function reduces an arbitrary chunk of information (in this case table data) to a fixed sized number that represents this data. This checksum acts as a fingerprint of sorts. MySQL implements the CRC-32 checksum algorithm which can be used to reduce the string ‘hello world’ to its 32-bit fingerprint.

[code language=”sql”] mysql> SELECT CRC32(‘hello world’);
+———————-+
| CRC32(‘hello world’) |
+———————-+
| 222957957 |
+———————-+[/code]

You can also use the MySQL CHECKSUM TABLE command to perform a CRC-32 checksum on the every row in the table.

[code language=”sql”]
mysql> CREATE TABLE `foo` (`greeting` char(11)) ENGINE=MyISAM;
mysql> INSERT INTO `foo` VALUES (‘hello world’);
mysql> CHECKSUM TABLE `foo`;
+—————+————+
| Table | Checksum |
+—————+————+
| bluegecko.foo | 3342502396 |
+—————+————+
[/code]

Now you’ll notice that this number is different from the ‘hello world’ checksum we did saw earlier. This is because CHECKSUM TABLE performs a CRC32 checksum on every row in the table including the row metadata. There is one or more bytes of metadata preceding each row. I just happen to know that in this case there is one byte for this row and it will contain the number 253, or in hex 0xFD.

[code language=”sql”]mysql> SELECT CRC32(CONCAT(0xFD, ‘hello world’));
+————————————+
| CRC32(CONCAT(0xFD, ‘hello world’)) |
+————————————+
| 3342502396 |
+————————————+[/code]

The way data is organized in rows isn’t always consistent across major versions of MySQL, which means that the checksums returned by CHECKSUM TABLE for the same data on MySQL 4.1 and MySQL 5.0 may differ. The checksums may also differ if you’re comparing two tables that have the same data but use different data types to store that data.

[code language=”sql”]
mysql> CREATE TABLE `bar` (`greeting` VARCHAR(255)) ENGINE=MyISAM;
mysql> INSERT INTO `bar` VALUES (‘hello world’);
mysql> CHECKSUM TABLE `bar`;
+—————+————+
| Table | Checksum |
+—————+————+
| bluegecko.bar | 2963561228 |
+—————+————+
[/code]

As you can see, the checksum for bar doesn’t match the checksum for foo. This is because the row metadata is different. Variable length strings require extra record keeping in the metadata.

The above caveats not withstanding, you can see how CHECKSUM TABLE can be very handy to verify that a table on both the master and slave are identical. The simplest and most straightforward method of doing this involve logging in to both databases in separate windows and running the CHECKSUM TABLE command on the same table at roughly the same time. This will likely work if the table you’re checksumming isn’t very busy. But if the checksums differ, you’re going to need to use a slightly more complicated method. On the master you start by locking the table so it can’t be written to:

[code language=”sql”]mysql> LOCK TABLE `tablename` READ[/code]

Once you’ve done that, you run CHECKSUM TABLE on the master and note the checksum. Then you go to the slave and use SHOW SLAVE STATUS to verify that it is caught up with the master and that all writes to the table have completed. At this point you can run CHECKSUM TABLE on the slave and the two checksums should match. As soon as you’ve done this you’ll want to run UNLOCK TABLES on the master so the table doesn’t remain locked.

You can see how doing this process manually for every table in your database could be time consuming and tedious work. Wouldn’t it be great if there was a tool that could do this for you? Luckily there is. I’ll cover how to use mk-table-checksum in the next half of this article.

12c Upgrade Bug with SQL Tuning Advisor

This blog post outlines steps to take on Oracle upgrade 11.2 to 12.1 if you’re having performance problems. Oracle offers a patch and work around to BUG 20540751.

Megan Elphingstone | March 22, 2017

Oracle EPM Cloud Vs. On-Premises: What’s the Difference?

EPM applications help measure the business performance. This post will help you choose the best EPM solutions for your organization’s needs and objectives.

Bobby Ellis | April 10, 2018

Scripting Out the Logins, Server Role Assignments, and Server Permissions

Imagine there are over one hundred logins in the source server and you need to migrate them all over to the destination server. Wouldn’t it be awesome if we could automate the process by generating the scripts for the required tasks?

JP Chen | October 1, 2015

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.

CONTACT US

Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.

EXPLORE JOBS