A Read Replica is a read-only copy (snapshot) of your source instance that can be used for scaling your read-traffic, reporting/data warehousing environment, scheduled maintenance activity, ensuring availability, or for disaster recovery.
Read Replicas may or may not get stored in the same region as the source DB instance in AWS. Read Replicas are supported by several database engines, allowing you to choose the one that best fits your project:
When you use Amazon RDS for your Read Replica, the first thing that it replicates in MySQL or MariaDB are any changes that occurred to the source DB instance in between the snapshot and the present moment. Lag may occur during this process.
One reason for the lag comes down to the way MySQL and MariaDB create read-replicas. It’s asynchronous technology, so if you have a parallel run source DB instance that has a lot of write operations occurring, the writes for the Read Replicas become serialized. Cross-region replicas could cause additional latency.
Another cause of replication latency comes from resource-intensive operations for DDL or object maintenance. For example, if you’re creating an index, you may encounter performance issues that present themselves as replication lag. Monitor your binary log retention period using rds_show_configuration and rds_set_configuration stored procedure calls.
The max retention for binary logs is 30 days. If your replication process stops for longer than this period, whether you stop it manually or it runs into a replication error, Amazon RDS will no longer replicate the source database instance.
Amazon CloudWatch makes it simple to monitor your binary logs. Search for the Amazon RDS ReplicaLag value to check for any replication problems. A value of -1 indicates one of two things: either replication is not currently active for the source DB instance, or it ran into an error that caused a failure state.A value of 0 indicates that the replica and the source DB instance are synced.
How to Reduce the Potential Latency on a Read Replica
- Avoid any large transactions on the Source DB instance. Do smaller transactions in chunks and perform frequent commits.
- Consider enabling parallel replication threads based on your supported versions of MySQL or MariaDB.
- Ensure DB Instance and Storage class are similar between the Source DB and Read Replica.
- Check the configuration of the DB parameter groups with the source DB instance and the Read Replica for compatibility issues.
- Turn the query cache off when your operation involves queries that have highly volatile objects. In such cases, disabling the query cache might improve your replication catch up.
Troubleshooting Replication Failures on a Read Replica
Amazon RDS has monitoring tools to inform you of issues during replication. If the replication fails or otherwise stops, the monitor indicates this information through an Error value in the Replication State field for that Read Replica instance.
MariaDB and MySQL make it particularly easy to get started on the troubleshooting process, so you’re in luck when you’re using either of these database engines. They have a Replication Error field that provides additional information about why the replication process stopped.
Common Replication Errors You May Encounter
The source DB instance should have a higher max_allowed_packet parameter value than the Read Replica. This replication issue has a simple fix. The source DB instance and the Read Replica can have the same value set for this parameter.
Replication also gets interrupted if the Read Replica has tables written to it. Using unsafe non-deterministic queries might lead to data inconsistencies and break replication.
Are in-memory tables part of your replication process? If the database instance ends up restarting, going to a failover, or promoting a read-replica, the tables get emptied.
Are you attempting to use non-transactional storage engines for your Read Replicas? Transactional storage engines are a must for Read Replica operations, so you would need to convert your non-transactional storage engines.
Here are database troubleshooting tips to help you fix issues during the replication process. Please be careful and read the AWS documentation for more details on these specific versions, commands and use cases.
- Do you have a logical error you can safely skip? Use the command mysql.rds_skip_repl_error.
- Running into binlog position problems? Make changes to the slave replay position through the command mysql_rds_next_master_log.
- Check your settings (defaults, based version) and adjust them to improve latency catchup. For example, parameters such as innodb_flush_log_at_trx_commit and sync-binlog allow you to achieve crash-safe environments and Read Replicas that are fully synced to the source DB instance on every commit. (Use ROW-based replication and GTIDs where supported)
Subscribe to Our Blog
Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.
Most people will encounter this error when their application tries to connect to an Oracle database service, but it can also be raised by one database instance trying to connect to another database service via a database link.