Why Choose a Galera Based Clustering Solution for MySQL

Author: Wesley Lifford | 3 min read | February 2, 2016

You need your application to stay up and available in case of software/OS crash, network issues, or hardware failure. Simple Master/Slave replication is not Highly Available, since there will be some delay in your application’s availability even under the best of circumstances. Master/Master also has its share of downsides (Primary Key collisions and data drift just to name a couple) that are also amplified when you have network related issues. We could even consider using GTID based replication for data integrity across replication chain and use some of the MySQL utilities for faster failover/failback. Each has a trade-off of its own. Galera based clustering solutions eliminates these additional layers and provides looking for a pure synchronous HA solution. Galera based clustering solutions eliminates these additional layers and provides a pure synchronous HA solution.

Galera based clustering offers semi-synchronous replication, active-active multi-master topology (read and write to any node,) automatic cluster node joining, membership control, and dropping upon failure, parallel row-level replication, and no replication lag while providing a standard MySQL look and feel.

Galera based clustering works best when given a minimum of 3 data nodes. Yes, it is possible to run with only 2 data nodes and an arbitrator daemon on a 3rd node, but this configuration leaves you vulnerable to an outage when one of the data nodes suddenly has to assume all client requests because of a problem on the only other data node.

Overall, Galera based clustering is a robust and solid solution for high availability. Running on a 3 node Galera based cluster allows you to distribute writes and reads as you see fit while also allowing you to perform maintenance or schema upgrades without downtime.

Of course, this flexibility also comes with a price. You’ll need at least 3 nodes (and a way to load balance the connections to the cluster.) Because the writes have to be committed on all 3 nodes (take network latency into account), the overall system may be slightly slower than it would running standalone MySQL. The entire cluster will be as slow as the slowest node.

Currently, InnoDB is the only storage engine supported with Galera based replication. A primary key must also be defined on each table. Writes to tables of other storage engines are not replicated (DDL statements to the mysql tables are replicated). This means any other MyISAM tables would need to be converted to InnoDB before using Galera based clustering.

Galera cluster also has limitations on transaction size. The writesets are processed in memory and because of this, large transactions will impact performance. The default transaction limits are set for 128K rows and 1GB transaction size. These can be adjusted, but for best practices and performance, large transactions should be broken up into smaller chunks.

There are currently two forks of MySQL that you can use to setup a Galera based cluster, Percona or MariaDB. Deciding on which to use is a matter of personal preference.

Blog Author

Wesley Lifford

Wesley has more than 15 years of professional experience in database, network, and systems administration. His specializes in Linux systems and MySQL database administration, including Highly Available solutions, performance tuning, and automation scripting. As a database administrator, he enjoys finding creative solutions to complex problems.

Why Choose a Galera Based Clustering Solution for MySQL

Blog Author

Subscribe to Our Blog