How to Resolve Systemd Issue with Percona XtraDB Cluster on CentOS 7
Author: Srinivasa Krishna | | January 18, 2017
Under CentOS 7, with the transition from System V to Systemd, Percona XtraDB Cluster was integrated with Systemd. As a result, Percona XtraDB cluster’s operation mode was split into two different scripts.
/usr/lib/systemd/system/mysql.service : Starts PXC with mysqld_safe, i.e. start/stop/restart a single node /usr/lib/systemd/system/mysql@.service : Passes an environment file to PXC which passes additional arguments (EXTRA_ARGS) to mysqld_safe. This script is used to bootstrap a node using the following command: systemctl start firstname.lastname@example.org This reads the file /etc/sysconfig/mysql.bootstrap and passes to mysqld_safe additional argument EXTRA_ARGS=" --wsrep-new-cluster ".
Systemd brings many advantages over its previous init systems, i.e. sysvinit, and upstart, including but not limited to: socket and D-Bus activation during startup; on-demand startup of services; aggressive parallelization capabilities; and many others. However, it presents an issue with the synchronization of nodes in PXC as discussed below.
PXC Issue with Systemd
PXC SST fails to sync the nodes, thus being unable to join cluster group and giving broken pipe 32 SIG errors on both xtrabackup-v2 and rsync methods. As indicated in the log’s error message on both donor and joining node, the ports are open and communicating well, and SST is working correctly.
On Donor Node
>> log scanned up to (15099371495026) >> log scanned up to (15099371495026) >> log scanned up to (15099371495026) >> log scanned up to (15099371495026) >> log scanned up to (15099371495026) innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe) xb_stream_write_data() failed. innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)  xtrabackup: Error: xtrabackup_copy_datafile() failed.  xtrabackup: Error: failed to copy datafile. Backup Progress and Error log: socat E write(3, 0xe79200, 8192): Broken pipe donor: => Rate:[8.39MiB/s] Avg:[8.39MiB/s] Elapsed:2:28:31 WSREP_SST: [INFO] NOTE: donor-SST took 8912 seconds WSREP_SST: [ERROR] innobackupex finished with error: 1. Check innobackup.backup.log WSREP_SST: [ERROR] Cleanup after exit with status:22
On Joining Node
joiner: => Rate:[1.83MiB/s] Avg:[6.88MiB/s] Elapsed:2:28:10 joiner: => Rate:[3.34MiB/s] Avg:[6.88MiB/s] Elapsed:2:28:20 WSREP_SST: [ERROR] Removing /.sst/xtrabackup_galera_info file due to signal WSREP_SST: [ERROR] Cleanup after exit with status:143 [ERROR] WSREP: Process was aborted. systemd: mysql.service: control process exited, code=exited status=2 mysqld_safe: 161115 20:46:15 mysqld_safe mysqld from pid file /b001/app/mysql/mysql.pid ended mysql-systemd: WARNING: mysql pid file /b001/app/mysql/mysql.pid empty or not readable mysql-systemd: WARNING: mysql may be already dead systemd: Failed to start Percona XtraDB Cluster. systemd: Unit mysql.service entered failed state. systemd: mysql.service failed.
Explanation of the Problem
On Joining Node, during startup, SST is taking longer time to sync, therefore forcing startup script to fail. Unfortunately, the error message in the logs is misleading because it points to the broken pipe on SST streaming method. It can be noted that SST is mostly failing precisely at the same time in approximately 2 hours 28 minutes.
Check Systemd process and startup scripts for MySQL:
mysql systemd startup script (/usr/lib/systemd/system/mysql.service) has references for the timeout settings. Normally, you would not notice similar issues in Linux6.
Look for these variables and lines in /usr/bin/mysql-systemd
service_startup_timeout=900 # Default startup_sleep=1
[[ $i -lt $service_startup_timeout ]]
if [[ $verb = 'created' ]];then if ([[ -e $sst_progress_file ]] || grep -q -- '--wsrep-new-cluster' <<< "$env_args" ) \ && [[ $startup_sleep -ne 10 ]];then echo "State transfer in progress, setting sleep higher" startup_sleep=10 #increments in 10sec based on SST progress. fi fi i=$(( i+1 )) sleep $startup_sleep
During SST transfer, the startup script will sleep for 10 seconds, effectively increasing timeout until it reaches the desired timeout value of 2 hours and 28 minutes. On slower networks, if you have multiple 100’s of gigabytes of data, it might take more time for SST to complete and these settings might not be sufficient. Depending on your dbsize, N/W bandwidth and how long it takes for SST to finish, increase these settings. During testing on a slower network with ~400G data, it took ~9 hours for SST to finish successfully.
This issue is not present in the older versions of Linux. Therefore, it can be deduced that if you are running PXC cluster on CentOS 7 with Systemd, it’s recommended to properly map references for timeout settings from multiple configuration files to resolve the Systemd issue with PXC cluster. Let’s hope the issue will be addressed in future releases.
To learn more, please contact Datavail today. With more than 600 database administrators worldwide, Datavail is the largest database services provider in North America. As a reliable provider of 24×7 managed services for applications, BI/Analytics, and databases, Datavail can support your organization, regardless of the build you’ve selected.
Datavail Script: Terms & Conditions
By using this software script (“Script”), you are agreeing to the following terms and condition, as a legally enforceable contract, with Datavail Corporation (“Datavail”). If you do not agree with these terms, do not download or otherwise use the Script. You (which includes any entity whom you represent or for whom you use the Script) and Datavail agree as follows:
1. CONSIDERATION. As you are aware, you did not pay a fee to Datavail for the license to the Script. Consequently, your consideration for use of the Script is your agreement to these terms, including the various waivers, releases and limitations of your rights and Datavail’s liabilities, as setforth herein.
2. LICENSE. Subject to the terms herein, the Script is provided to you as a non-exclusive, revocable license to use internally and not to transfer, sub-license, copy, or create derivative works from the Script, not to use the Script in a service bureau and not to disclose the Script to any third parties. No title or other ownership of the Script (or intellectual property rights therein) is assigned to you.
3. USE AT YOUR OWN RISK; DISCLAIMER OF WARRANTIES. You agree that your use of the Script and any impacts on your software, databases, systems, networks or other property or services are solely and exclusively at your own risk. Datavail does not make any warranties, and hereby expressly disclaims any and all warranties, implied or express, including without limitation, the following: (1) performance of or results from the Script, (2) compatibility with any other software or hardware, (3) non-infringement or violation of third party’s intellectual property or other property rights, (4) fitness for a particular purpose, or (5) merchantability.
4. LIMITATION ON LIABILITY; RELEASE. DATAVAIL SHALL HAVE NO, AND YOU WAIVE ANY, LIABILITY OR DAMAGES UNDER THIS AGREEMENT.
You hereby release Datavail from any claims, causes of action, losses, damages, costs and expenses resulting from your downloading or other use of the Script.
5. AGREEMENT. These terms and conditions constitute your complete and exclusive legal agreement between you and Datavail.
The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.