Author: MySQL Performance Blog

Improved wsrep-stages and related instrumentation in Percona XtraDB Cluster

In this blog post, we’ll look at how we’ve improved wsrep-stages and related instrumentation in Percona XtraDB Cluster.
Introduction
When you execute a workload and need to find out what the given thread is working on, “SHOW PROCESSLIST” comes to the top of your mind. It is an effective way to track the thread status. We decided to improve the stages in Percona XtraDB Cluster to make “SHOW PROCESSLIST” more meaningful.
In the blog below, we will check out the different wsrep-stages and the significance associated with them.
Loading of data
Running a simple insert/sysbench prepare workload. The state is stable as it mainly captures MySQL stages indicating that the table is being updated:
| 9 | root | localhost | test | Query | 0 | update | INSERT INTO sbtest3(id, k, c, pad) VALUES(893929,515608,’28459951974-62599818307-78562787160-7859397 | 0 | 0 |

Running UPDATE workload
Running simple sysbench update-key workload. Let’s look at the different states that the user sees and their significance. (MASTER and SLAVE states are different and are marked accordingly.)
MASTER view:

This stage indicates that the write-set is trying to replicate. Global sequence numbers are assigned after the write-set is replicated and so the global-seqno is currently -1:

| 80 | root | localhost | test | Query | 0 | wsrep: initiating replication for write set (-1) | UPDATE sbtest4 SET k=k+1 WHERE id=502338 | 0 | 1 |

This stage indicates successful replication of the write-set. This means the write-set is now added to the group-channel. Global-seqno is updated in the message too:

| 79 | root | localhost | test | Query | 0 | wsrep: write set replicated (196575) | UPDATE sbtest3 SET k=k+1 WHERE id=502723 | 0 | 1 |

This stage indicates the write-set has successfully passed the certification stage (making its path clear for commit):

| 85 | root | localhost | test | Query | 0 | wsrep: pre-commit/certification passed (196574) | UPDATE sbtest7 SET k=k+1 WHERE id=495551 | 0 | 1 |

This stage indicates that InnoDB commit has been triggered for the write-set:

| 138 | root | localhost | test | Query | 0 | innobase_commit_low (585721) | UPDATE sbtest6 SET k=k+1 WHERE id=500551 | 0 | 1 |
SLAVE/Replicating node view:

This stage indicates that the slave thread is trying to commit the replicated write-set with the given seqno. It is likely waiting for its turn of the CommitMonitor:

|  6 | system user |           | NULL | Sleep   |    0 | wsrep: committing write set (224905) | NULL             |         0 |             0 |

This stage indicates a successful commit of the replicated write-set with the given seqno:

| 2 | system user | | NULL | Sleep | 0 | wsrep: committed write set (224896) | NULL | 0 | 0 |

This stage indicates that updating the rows is in progress. (Often it was difficult to know what the workload is trying to do: UPDATE/INSERT/DELETE.) Now there is an easy way to find out:

| 13 | system user |           | NULL | Sleep   |    0 | wsrep: updating row for write-set (178711) | NULL             |         0 |             0 |

| 18 | system user | | NULL | Sleep | 0 | wsrep: writing row for write-set (793454) | NULL | 0 | 0 |

| 11 | system user | | NULL | Sleep | 0 | wsrep: deleting row for write-set (842123) | NULL | 0 | 0 |

This stage indicates that the given write-set is being applied:

| 10 | system user | | NULL | Sleep | 0 | wsrep: applying write-set (899370) | NULL | 0 | 0 |

Improved Instrumentation
Let’s answer some simple questions that most profiling experts will face:

How long did replication take (adding write-set to channel)?

mysql> select event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like ‘%in replicate%’ order by time_in_mics desc limit 5;
+—————————————+————–+
| event_name | time_in_mics |
+—————————————+————–+
| stage/wsrep/wsrep: in replicate stage | 1.2020 |
| stage/wsrep/wsrep: in replicate stage | 0.7880 |
| stage/wsrep/wsrep: in replicate stage | 0.7740 |
| stage/wsrep/wsrep: in replicate stage | 0.7550 |
| stage/wsrep/wsrep: in replicate stage | 0.7480 |
+—————————————+————–+
5 rows in set (0.01 sec)

How long did it take for pre-commit/certification checks?

mysql> select event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like ‘%in pre-commit%’ order by time_in_mics desc limit 5;
+—————————————-+————–+
| event_name | time_in_mics |
+—————————————-+————–+
| stage/wsrep/wsrep: in pre-commit stage | 1.3450 |
| stage/wsrep/wsrep: in pre-commit stage | 1.0000 |
| stage/wsrep/wsrep: in pre-commit stage | 0.9480 |
| stage/wsrep/wsrep: in pre-commit stage | 0.9180 |
| stage/wsrep/wsrep: in pre-commit stage | 0.9030 |
+—————————————-+————–+
5 rows in set (0.01 sec)

How long did it take to commit a transaction on the slave (slave_thread=16 threads)?

mysql> select thread_id, event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like ‘%committing%’ order by time_in_mics desc limit 5;
+———–+——————————-+————–+
| thread_id | event_name | time_in_mics |
+———–+——————————-+————–+
| 56 | stage/wsrep/wsrep: committing | 0.5870 |
| 58 | stage/wsrep/wsrep: committing | 0.5860 |
| 47 | stage/wsrep/wsrep: committing | 0.5810 |
| 59 | stage/wsrep/wsrep: committing | 0.5740 |
| 60 | stage/wsrep/wsrep: committing | 0.5220 |
+———–+——————————-+————–+
5 rows in set (0.00 sec)

Increasing the number of slave thread creates more contention (slave_thread=64):

mysql> select thread_id, event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like ‘%committing%’ order by time_in_mics desc limit 5;
+———–+——————————-+————–+
| thread_id | event_name | time_in_mics |
+———–+——————————-+————–+
| 90 | stage/wsrep/wsrep: committing | 1.6930 |
| 97 | stage/wsrep/wsrep: committing | 1.5870 |
| 103 | stage/wsrep/wsrep: committing | 1.5140 |
| 87 | stage/wsrep/wsrep: committing | 1.2560 |
| 102 | stage/wsrep/wsrep: committing | 1.1040 |
+———–+——————————-+————–+
5 rows in set (0.00 sec)

The amount oftTime taken to apply a write-set:

mysql> select thread_id, event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like ‘%applying%’ order by time_in_mics desc limit 5;
+———–+—————————————+————–+
| thread_id | event_name | time_in_mics |
+———–+—————————————+————–+
| 166 | stage/wsrep/wsrep: applying write set | 1.6880 |
| 168 | stage/wsrep/wsrep: applying write set | 1.5820 |
| 146 | stage/wsrep/wsrep: applying write set | 1.5270 |
| 124 | stage/wsrep/wsrep: applying write set | 1.4760 |
| 120 | stage/wsrep/wsrep: applying write set | 1.4440 |
+———–+—————————————+————–+
5 rows in set (0.00 sec)

Conclusion
The improved wsrep-stage framework makes it more effective for a user to find out the state of a given thread. Using the derived instrumentation through wsrep-stage is a good way to understand where the time is being spent.

Percona XtraDB Cluster Transaction Replay Anomaly

In this blog post, we’ll look at a transaction replay anomaly in Percona XtraDB Cluster.
Introduction
Percona XtraDB Cluster/Galera replays a transaction if the data is non-conflicting but, the transaction happens to have conflicting locks.
Anomaly
Let’s understand this with an example:

Let’s assume a two-node cluster (node-1 and node-2)
Base table “t” is created as follows:

create database test;
use test;
create table t (i int, c char(20), primary key pk(i)) engine=innodb;
insert into t values (1, ‘abc’), (2, ‘abc’), (4, ‘abc’);
select * from t;
mysql> select * from t;
+—+——+
| i | c |
+—+——+
| 1 | abc |
| 2 | abc |
| 4 | abc |
+—+——+

node-2 starts runs a transaction (trx-2):

trx-2: update t set c = ‘pqr';

node-2 creates a write-set and is just about to replicate it. At the same time, node-1 executes the following transaction (trx-1), and is first to add it to the group-channel (before node-2 adds transaction (trx-2))

trx-1: insert into t values (3, ‘a’);

trx-1 is replicated on node-2, and it proceeds with the apply action. Since there is a lock conflict (no certification conflict), node-2 local transaction (trx-2) is aborted and scheduled for replay.
trx-1 causes addition of (3, ‘a’) and then node-2 transaction is REPLAYed.
REPLAY is done using the pre-created write-set that only modifies existing entries (1,2,4).

End-result:
mysql> select * from t;
+—+——+
| i | c |
+—+——+
| 1 | pqr |
| 2 | pqr |
| 3 | a |
| 4 | pqr |
+—+——+

At first, nothing looks wrong. If you look closely, however, the REPLAYed transaction “UPDATE t set c= ‘pqr’” is last to commit. But the effect of it is not seen as there is still a row (3, ‘a’) that has ‘a’ instead of ‘pqr’.

| mysql-bin.000003 | 792 | Gtid | 2 | 857 | SET @@SESSION.GTID_NEXT= ‘6706fa1f-e3df-ee18-6621-c4e0bae533bd:4′ |
| mysql-bin.000003 | 857 | Query | 2 | 925 | BEGIN |
| mysql-bin.000003 | 925 | Table_map | 2 | 972 | table_id: 219 (test.t) |
| mysql-bin.000003 | 972 | Write_rows | 2 | 1014 | table_id: 219 flags: STMT_END_F existing|
| mysql-bin.000003 | 1014 | Xid | 2 | 1045 | COMMIT /* xid=4 */ |
| mysql-bin.000003 | 1045 | Gtid | 3 | 1110 | SET @@SESSION.GTID_NEXT= ‘6706fa1f-e3df-ee18-6621-c4e0bae533bd:5′ |
| mysql-bin.000003 | 1110 | Query | 3 | 1187 | BEGIN |
| mysql-bin.000003 | 1187 | Table_map | 3 | 1234 | table_id: 219 (test.t) |
| mysql-bin.000003 | 1234 | Update_rows | 3 | 1324 | table_id: 219 flags: STMT_END_F |
| mysql-bin.000003 | 1324 | Xid | 3 | 1355 | COMMIT /* xid=5 */ |
+——————+——+—————-+———–+————-+———————————————————————————+
21 rows in set (0.00 sec)

We have used a simple char string, but if there is a constraint here, like c should have X after UPDATE is complete, than the CONSTRAINT will be violated even though the application reports UPDATE as a success.
Is it interesting to note what happens on node-1:

node-1 applies the local transaction (trx-1) and then gets the replicated write-set from node-2 (trx-2) that has changes only for (1,2,4). Thereby data consistency is not compromised.

BEWARE: Increasing fc_limit can affect SELECT latency

In this blog post, we’ll look at how increasing the fc_limit can affect SELECT latency.
Introduction
Recent Percona XtraDB Cluster optimizations have exposed fc_limit contention. It was always there, but was never exposed as the Commit Monitor contention was more significant. As it happens with any optimization, once we solve the bigger contention issues, smaller contention issues start popping up. We have seen this pattern in InnoDB, and Percona XtraDB Cluster is no exception. In fact, it is good because it tells us that we are on the right track.
If you haven’t yet checked the performance blogs, then please visit here and here.
What is FC_LIMIT?
Percona XtraDB Cluster has the concept of Flow Control. If any member of the cluster (not garbd) is unable to match the apply speed with the replicated write-set speed, then the queue builds up. If this queue crosses some threshold (dictated by gcs.fc_limit), then flow control kicks in. Flow control causes members of the cluster to temporary halt/slow-down so that the slower node can catch up.
The user can, of course, disable this by setting wsrep_desync=1 on the slower node, but make sure you understand the effect of doing so. Unless you have a good reason, you should avoid setting it.
mysql> show status like ‘wsrep_flow_control_interval';
+—————————–+————+
| Variable_name | Value |
+—————————–+————+
| wsrep_flow_control_interval | [ 16, 16 ] |
+—————————–+————+
1 row in set (0.01 sec)

Increasing fc_limit
Until recently, the default fc_limit was 16 (starting with Percona XtraDB Cluster 5.7.17-29.20, the default is 100). This worked until now, since Percona XtraDB Cluster failed to scale and rarely hit the limit of 16. With new optimizations, Percona XtraDB Cluster nodes can process more write-sets in a given time period, and thereby can replicate more write-sets (anywhere in the range of three to ten times). Of course, the replicating/slave nodes are also performing at a higher speed. But depending on the slave threads, it is easy to start hitting this limit.
So what is the solution?

Increase fc_limit from 16 to something really big. Say 1600.

Is this correct?
YES and NO.
Why YES?

If you don’t care about the freshness of data on the replicated nodes, then increasing the limit to a higher value is not an issue. Say setting it to 10K means that the replicating node is holding 10K write-sets to replicate, and a SELECT fired during this time will not view changes from these 10K write-sets.
But if you insist on having fresh data, then Percona XtraDB Cluster has a solution for this (set wsrep_sync_wait=7).
Setting wsrep_sync_wait places the SELECT request in a queue that is serviced only after existing replicated write-sets (at the point when the SELECT was fired) are done with. If the queue has 8K write-sets, then SELECT is placed at the 8K+1 position. As the queue progresses, SELECT gets serviced only when all those 8K write-sets are done. This insanely increases SELECT latency and can cause all Monitoring ALARM to go ON.

Why NO?

For the reason mentioned above, we feel it is not a good idea to increase the fc_limit beyond some value unless you don’t care about data freshness and in turn don’t care to set wsrep_sync_wait.
We did a small experiment with the latest Percona XtraDB Cluster release to understand the effects.

– Started 2 node cluster.
– Fired 64-threads workload on node-1 of the cluster.
– node-2 is acting as replicating slave without any active workload.
– Set wsrep_sync_wait=7 on node-2 to ensure data-freshness.
Using default fc_limit (= 16)
—————————–
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+————-+
| sum(k) |
+————-+
| 22499552612 |
+————-+
1 row in set (0.03 sec)
Increasing it from 16 -> 1600
—————————–
mysql> set global wsrep_provider_options=”gcs.fc_limit=1600″;
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+————-+
| sum(k) |
+————-+
| 22499552612 |
+————-+
1 row in set (0.46 sec)
That is whopping 15x increase in SELECT latency.
Increasing it even further (1600 -> 25000)
——————————————-
mysql> set global wsrep_provider_options=”gcs.fc_limit=25000″;
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+————-+
| sum(k) |
+————-+
| 22499552612 |
+————-+
1 row in set (7.07 sec)
Note: wsrep_sync_wait=7 will enforce the check for all DMLs (INSERT/UPDATE/DELETE/SELECT). We highlighted the SELECT example, as that is more concerning at first go. But latency for other DMLs also increases for the same reasons as mentioned above.
Conclusion
Let’s conclude with the following observation:

Avoid increasing fc_limit to an insanely high value as it can affect SELECT latency (if you are running a SELECT session with wsrep_sync_wait=7 for data freshness).

Percona Server for MySQL in Docker Swarm with Secrets

This quick post demonstrates using Percona Server for MySQL in Docker Swarm with some new authentication provisioning practices.
Some small changes to the startup script for the Percona-Server container image allows us to specify a file that contains password values to set as our root user’s secret. “Why do we need this functionality,” I hear you cry? When we use an environment variable, it’s not terribly hard to locate the value to which someone has set as their database root password. Environment variables are not well suited for sensitive data. We preach against leaving our important passwords in easy to reach places. So moving towards something more secure whilst retaining usability is desirable. I’ll detail the current methods, the problems, and finish off with Docker Secrets – which in my opinion, is the right direction to be heading.
Environment Variables
I’ll elaborate on the main reason why we would want to change from the default given behavior. In the documentation for using the MySQL/Percona and MariaDB containers, we are invited to start containers with an environment variable to control what the instance’s root password is set as upon startup. Let’s demonstrate with the latest official Percona-Server image from Percona’s repository of images on the Docker Hub registry:
moore@chom:~$ docker pull percona/percona-server:latest
latest: Pulling from percona/percona-server
e12c678537ae: Already exists
65ab4b835640: Pull complete
f63269a127d1: Pull complete
757a4fef28b8: Pull complete
b0cb547a5105: Pull complete
4214179ba9ea: Pull complete
155dafd2fd9c: Pull complete
848020b1da10: Pull complete
771687fe7e8b: Pull complete
Digest: sha256:f3197cac76cccd40c3525891ce16b0e9f6d650ccef76e993ed7a22654dc05b73
Status: Downloaded newer image for percona/percona-server:latest
Then start a container:
moore@chom:~$ docker run -d
–name percona-server-1
-e MYSQL_ROOT_PASSWORD=’secret’
percona/percona-server
d08f299a872f1408c142b58bc2ce8e59004acfdb26dca93d71f5e9367b4f2a57
moore@chom:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d08f299a872f percona/percona-server “/entrypoint.sh ” 32 seconds ago Up 31 seconds 3306/tcp percona-server-1
Looks good, eh? Let’s inspect this container a little closer to reveal why this method is flawed:
moore@chom:~$ docker inspect –format ‘{{ index (index .Config.Env) 0}}’ percona-server-1
MYSQL_ROOT_PASSWORD=secret
*facepalm*
We don’t want the root password exposed here, not really. If we wanted to use this method in docker-compose files, we would also be storing passwords inline, which isn’t considered a secure practice.
Environment File
Another approach is to use an environment file. This is simply a file that we can provide docker run or docker-compose in order to instantiate the environment variables within the container. It’s a construct for convenience. So just to illustrate that we have the same problem, the next example uses the mechanism of an environment file for our database container:
moore@chom:~$ echo ‘MYSQL_ROOT_PASSWORD=secret’ > /tmp/ps.env
moore@chom:~$ docker run -d –name percona-server-2 –env-file=/tmp/ps.env percona/percona-server
d5105d044673bd5912e0e29c2f56fa37c5f174d9d2a4811ceaba284092837c84
moore@chom:~$ docker inspect –format ‘{{ index (index .Config.Env) 0}}’ percona-server-2
MYSQL_ROOT_PASSWORD=secret
NOTE: shortly after starting this container failed because we didn’t provide mysql root password options
While we’re not specifying it in our docker run command or our docker-compose.yml file, the password value remains on our filesystem within the environment file. Again, not ideal.
Password File
With the ability to use a password file it obscures this from the inspect output. Let’s roll through the steps we would use to leverage this new option. With our new Percona-Server image, we’re going to start a container, but first let’s create an arbitrary file containing our desired password:
moore@chom:~$ docker:cloud> echo “secret” > /tmp/mysql_pwd_file
Now start a container where we’re going to bind mount the file, and use our new environment variable to point to it:
moore@chom:~$ docker run -v /tmp/mysql_pwd_file:/tmp/mysqlpwd –name percona-secret
-e MYSQL_ROOT_PASSWORD_FILE=/tmp/mysqlpwd percona/percona-server:latest
With the same inspect command, let’s show that there’s no snooping on our password value:
moore@chom:~$ docker inspect –format ‘{{ index (index .Config.Env) 0}}’ percona-secret
MYSQL_ROOT_PASSWORD_FILE=/tmp/mysqlpwd
We are revealing the path where our password was read from within the container. For those eagle-eyed readers, this file was just a bind mounted file in the docker run command, and it’s still on the host’s filesystem.
moore@chom:~$ cat /tmp/mysql_pwd_file
secret
moore@chom:~$ docker exec percona-secret cat /tmp/mysqlpwd
secret
Not perfect, because we need to have that file available on all of our Docker hosts, but it works and we’re closer to a more robust solution.
Docker Secrets
The main reason for the new environment variable is to leverage the docker secrets feature. Since Docker version 1.13 (17.03 is now GA), we have the Docker Secrets feature, however it’s only available to the Docker Swarm workflow. If you’re not already working with Docker Swarm mode, I can’t recommend it enough. It’s part of Docker-engine, simple to get started, and intuitive since 1.13 it is compatible with docker-compose files. You don’t need to have a cluster of hardware, it’s entirely valid to use Docker Swarm on a single node. This allows you to test on your local environment with ease.
I won’t waste pixels explaining what’s already well documented in official channels, but in summary: Docker secrets is a new feature that allows us to keep sensitive information out of source code and configuration files. Secrets are stored in the Raft log which is encrypted and replicated throughout the Docker Swarm cluster. The protection and distribution come for free out of the box, which is a fantastic feature if you ask me.
So, let’s create a Docker Secret. Please note that I’ve moved to my Docker Swarm installation for this next part:
moore@chom:~$ docker:cloud> docker info | egrep -i ‘swarm|version’
Server Version: 17.03.0-ce
Swarm: active
Operating as a swarm manager we have the ability to create a new secret to serve as our root user’s password:
moore@chom:~$ docker:cloud> echo “{secret_string}” | docker secret create mysql_root_password –
ugd8dx0kae9hbyt4opbolukgi
We can list all of our existing secrets:
moore@chom:~$ docker:cloud> docker secret ls
ID NAME CREATED UPDATED
ugd8dx0kae9hbyt4opbolukgi mysql_root_password Less than a second ago Less than a second ago
Now our secret has been created, it’s obscured from us. We are unable to see it’s value.
moore@chom:~$ docker secret inspect mysql_root_password
[
{
“ID”: “ugd8dx0kae9hbyt4opbolukgi”,
“Version”: {
“Index”: 905780
},
“CreatedAt”: “2017-04-11T23:33:08.118037434Z”,
“UpdatedAt”: “2017-04-11T23:33:08.118037434Z”,
“Spec”: {
“Name”: “mysql_root_password”
}
}
]
Now we can use our secret to set our authentication for the MySQL instance by doing the following:
moore@chom:~$ docker service create
–name percona-secret
–secret mysql_root_password
-e MYSQL_ROOT_PASSWORD_FILE=/run/secrets/mysql_root_password
percona/percona-server:latest
You can see that instead of docker run, I’ve issued the swarm equivalent docker service create, which is going to start a new Percona-Server container in the scope of my Swarm workflow. I’m also using the –secret option to tell docker to mount my secret in the container, which gets mounted to a file under the path /run/secrets/{secret_name}. The final point here, I’m passing MYSQL_ROOT_PASSWORD_FILE=/path/to/secret as an environment variable to let the startup script know where to find the file with my secret value for the root password. Once the startup routine has completed and the container has started successfully I can connect to my container to test the password was set correctly:
moore@chom:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
397bdf9b75f9 percona/percona-server “/entrypoint.sh ” 46 seconds ago Up 44 seconds 3306/tcp percona-secret.1.9xvbneset9363dr5xv4fqqxua
moore@chom:~$ docker exec -ti 397bdf9b75f9 bash
mysql@397bdf9b75f9:/$ cat /run/secrets/mysql_root_password
{secret_string}
mysql@397bdf9b75f9:/$ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 4
Server version: 5.7.17-11 Percona Server (GPL), Release ’11’, Revision ‘f60191c’
Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the current input statement.
mysql>
The secret can be shared around any container where it’s necessary, simply by telling Docker to use the secret when instantiating a container. For example, if I wanted to start an application container such as a WordPress instance, I can use a secret object to easily share credentials to the data source safely and consistently.
This method is totally viable for other forms of sensitive data. For example, I can generate SSL certificates and use Docker secrets to add them to my containers for encrypted replication or forcing secure logins from remote clients. I’m still thinking of all the possible use cases for this option and no doubt will have some more to share with you in the near future.
Please share your comments, suggestions and corrections in the comments below. Thank you for reading.

Simplified Percona XtraDB Cluster SSL Configuration

In this blog post, we’ll look at a feature that recently added to Percona XtraDB Cluster 5.7.16, that makes it easier to configure Percona XtraDB Cluster SSL for all related communications. It uses mode “encrypt=4”, and configures SSL for both IST/Galera communications and SST communications using the same SSL files. “encrypt=4” is a new encryption mode added in Percona XtraDB Cluster 5.7.16 (we’ll cover it in a later blog post).
If this option is used, this will override all other Galera/SST SSL-related file options. This is to ensure that a consistent configuration is applied.
Using this option also means that the Galera/SST communications are using the same keys as client connections.
Example
This example shows how to startup a cluster using this option. We will use the default SSL files created by the bootstrap node. Basically, there are two steps:

Set pxc_encrypt_cluster_traffic=ON on all nodes
Ensure that all nodes share the same SSL files

Step 1: Configuration (on all nodes)
We enable the
pxc_encrypt_cluster_traffic option in the configuration files on all nodes. The default value of this option is “OFF”, so we enable it here.[mysqld]
pxc_encrypt_cluster_traffic=ON

Step 2: Startup the bootstrap node
After initializing and starting up the bootstrap node, the datadir will contain the necessary data files. Here is some SSL-related log output:
[Note] Auto generated SSL certificates are placed in data directory.
[Warning] CA certificate ca.pem is self signed.
[Note] Auto generated RSA key files are placed in data directory.
The required files are ca.pem, server-cert.pem and server-key.pem, which are the Certificate Authority (CA) file, the server certificate and the server private key, respectively.
Step 3: Copy the SSL files to all other nodes
Galera views the cluster as a set of homogeneous nodes, so the same configuration is expected on all nodes. Therefore, we have to copy the CA file, the server’s certificate and the server’s private key. By default, MySQL names these: ca.pem, server-cert.pem, and server-key.pem, respectively.
Step 4: Startup the other nodes
This is some log output showing that the SSL certificate files have been found. The other nodes should be using the files that were created on the bootstrap node.
[Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
[Note] Skipping generation of SSL certificates as certificate files are present in data directory.
[Warning] CA certificate ca.pem is self signed.
[Note] Skipping generation of RSA key pair as key files are present in data directory.
This is some log output (with
log_error_verbosity=3), showing the SST reporting on the configuration used.WSREP_SST: [DEBUG] pxc_encrypt_cluster_traffic is enabled, using PXC auto-ssl configuration
WSREP_SST: [DEBUG] with encrypt=4 ssl_ca=/my/data//ca.pem ssl_cert=/my/data//server-cert.pem ssl_key=/my/data//server-key.pem

Customization
The “ssl-ca”, “ssl-cert”, and “ssl-key” options in the “[mysqld]” section can be used to specify the location of the SSL files. If these are not specified, then the datadir is searched (using the default names of “ca.pem”, “server-cert.pem” and “server-key.pem”).
[mysqld]
pxc_encrypt_cluster_traffic=ON
ssl-ca=/path/to/ca.pem
ssl-cert=/path/to/server-cert.pem
ssl-key=/path/to/server-key.pem
If you want to implement this yourself, the equivalent configuration file options are:
[mysqld]
wsrep_provider_options=”socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem”
[sst]
encrypt=4
ssl-ca=ca.pem
ssl-cert=server-cert.pem
ssl-key=server-key.pem

How it works

Determine the location of the SSL files

Uses the values if explicitly specified (via the “ssl-ca”, “ssl-cert” and “ssl-key” options in the “[mysqld]” section)
If the SSL file options are not specified, we look in the data directory for files named “ca.pem”, “server-cert.pem” and “server-key.pem” for the CA file, the server certificate, and the server key, respectively.

Modify the configuration

Overrides the values for socket.ssl_ca, socket.ssl_cert, and socket.ssl_key in wsrep_provider_options in the “[mysqld]” section.
Sets “encrypt=4” in the “[sst]” section.
Overrides the values for ssl-ca, ssl-cert and ssl-key in the “[sst]” section.

This is not a dynamic setting, and is only available on startup.

Tracking IST Progress in Percona XtraDB Cluster

In this blog post, we’ll look at how Percona XtraDB Cluster uses IST.
Introduction
Percona XtraDB Cluster uses the concept of an Incremental State Transfer (IST). When a node of the cluster leaves the cluster for a short period of time, it can rejoin the cluster by getting the delta set of missing changes from any active node in the cluster.
This process of getting the delta set of changes is named as IST in Percona XtraDB Cluster.
Tracking IST Progress
The number of write-sets/changes that the joining node needs to catch up on when rejoining the cluster is dictated by:

The duration the node was not present in the cluster
The workload of the cluster during that time frame

This catch-up process can be time-consuming. Until this process is complete, the rejoining node is not ready to process any active workloads.
We believe that any process that is time-consuming should have a progress monitor attached to it. This is exactly what we have done.
In the latest release of Percona XtraDB Cluster 5.7.17-29.20, we added an IST progress monitor that is exposed through SHOW STATUS. This helps you to monitor the percentage of write-sets which has been applied by the rejoining node.
Let’s see this in a working example:

Start a two-node cluster
Process some basic workloads, allow cluster replication
Shutdown node-2
Node-1 then continues to process more workloads (the workload fits the allocated gcache)
Restart Node-2, causing it to trigger an IST

mysql> show status like ‘wsrep_ist_receive_status';
+————————–+——————————————————–+
| Variable_name | Value |
+————————–+——————————————————–+
| wsrep_ist_receive_status | 3% complete, received seqno 1421771 of 1415410-1589676 |
+————————–+——————————————————–+
1 row in set (0.00 sec)
….
mysql> show status like ‘wsrep_ist_receive_status';
+————————–+———————————————————+
| Variable_name | Value |
+————————–+———————————————————+
| wsrep_ist_receive_status | 52% complete, received seqno 1506799 of 1415410-1589676 |
+————————–+———————————————————+
1 row in set (0.00 sec)
….
mysql> show status like ‘wsrep_ist_receive_status';
+————————–+———————————————————+
| Variable_name | Value |
+————————–+———————————————————+
| wsrep_ist_receive_status | 97% complete, received seqno 1585923 of 1415410-1589676 |
+————————–+———————————————————+
1 row in set (0.00 sec)
mysql> show status like ‘wsrep_ist_receive_status';
+————————–+——-+
| Variable_name | Value |
+————————–+——-+
| wsrep_ist_receive_status | |
+————————–+——-+
1 row in set (0.00 sec)
As you can see, the wsrep_ist_receive_status monitoring string indicates the percentage completed, currently received write-set and the range of write-sets applicable to the IST.
Once the IST activity is complete, the variable shows an empty-string.
Closing Comments
I hope you enjoy this newly added feature. Percona Engineering would be happy to hear from you, about more such features that can help you make effective use of Percona XtraDB Cluster. We will try our best to include them in our future plans (based on feasibility).
Note: Special thanks for Kenn Takara and Roel Van de Paar for helping me edit this post.

How We Made Percona XtraDB Cluster Scale

In this blog post, we’ll look at the actions and efforts Percona experts took to scale Percona XtraDB Cluster.
Introduction
When we first started analyzing Percona XtraDB Cluster performance, it was pretty bad. We would see contention even with 16 threads. Performance was even worse with sync binlog=1, although the same pattern was observed even with the binary log disabled. The effect was not only limited to OLTP workloads, as even other workloads (like update-key/non-key) were also affected in a wider sense than OLTP.
That’s when we started analyzing the contention issues and found multiple problems. We will discuss all these problems and the solutions we adapted. But before that, let’s look at the current performance level.
Check this blog post for more details.
The good news is Percona XtraDB Cluster is now optimized to scale well for all scenarios, and the gain is in the range of 3x-10x.
Understanding How MySQL Commits a Transaction
Percona XtraDB Cluster contention is associated mainly with Commit Monitor contention, which comes into the picture during commit time. It is important to understand the context around it.
When a commit is invoked, it proceeds in two phases:

Prepare phase: mark the transaction as PREPARE, updating the undo segment to capture the updated state.

If bin-log is enabled, redo changes are not persisted immediately. Instead, a batch flush is done during Group Commit Flush stage.
If bin-log is disabled, then redo changes are persisted immediately.

Commit phase: Mark the transaction commit in memory.

If bin-log is enabled, Group Commit optimization kicks in, thereby causing a flush of redo-logs (that persists changes done to db-objects + PREPARE state of transaction) and this action is followed by a flush of the binary logs. Since the binary logs are flushed, redo log capturing of transaction commit doesn’t need to flush immediately (Saving fsync)
If bin-log is disabled, redo logs are flushed on completion of the transaction to persist the updated commit state of the transaction.

What is a Monitor in Percona XtraDB Cluster World?
Monitors help maintain transaction ordering. For example, the Commit Monitor ensures that no transaction with a global-seqno greater than the current commit-processing transaction’s global seqno is allowed to proceed.
How Percona XtraDB Cluster Commits a Transaction
Percona XtraDB Cluster follows the existing MySQL semantics of course, but has its own step to commit the transaction in the replication world. There are two important themes:

Apply/Execution of transaction can proceed in parallel
Commit is serialized based on cluster-wide global seqno.

Let’s understand the commit flow with Percona XtraDB Cluster involved (Percona XtraDB Cluster registers wsrep as an additional storage engine for replication).

Prepare phase:

wsrep prepare: executes two main actions:

Replicate the transaction (adding the write-set to group-channel)
Entering CommitMonitor. Thereby enforcing ordering of transaction.

binlog prepare: nothing significant (for this flow).
innobase prepare: mark the transaction in PREPARE state.

As discussed above, the persistence of the REDO log depends on if the binlog is enabled/disabled.

Commit phase

If bin-log is enabled

MySQL Group Commit Logic kicks in. The semantics ensure that the order of transaction commit is the same as the order of them getting added to the flush-queue of the group-commit.

If bin-log is disabled

Normal commit action for all registered storage engines is called with immediate persistence of redo log.

Percona XtraDB Cluster then invokes the post_commit hook, thereby releasing the Commit Monitor so that the next transaction can make progress.

With that understanding, let’s look at the problems and solutions:
PROBLEM-1:
Commit Monitor is exercised such that the complete commit operation is serialized. This limits the parallelism associated with the prepare-stage. With log-bin enabled, this is still ok since redo logs are flushed at group-commit flush-stage (starting with 5.7). But if log-bin is disabled, then each commit causes an independent redo-log-flush (in turn probable fsync).
OPTIMIZATION-1:
Split the replication pre-commit hook into two explicit actions: replicate (add write-set to group-channel) + pre-commit (enter commit-monitor).
The replicate action is performed just like before (as part of storage engine prepare). That will help complete the InnoDB prepare action in parallel (exploring much-needed parallelism in REDO flush with log-bin disabled).
On completion of replication, the pre-commit hook is called. That leads to entering the Commit Monitor for enforcing the commit ordering of the transactions. (Note: Replication action assigns the global seqno. So even if a transaction with a higher global seqno finishes the replication action earlier (due to CPU scheduling) than the transaction with a lower global seqno, it will wait in the pre-commit hook.)
Improved parallelism in the innodb-prepare stage helps accelerate log-bin enabled flow, and the same improved parallelism significantly helps in the log-bin disabled case by reducing redo-flush contention, thereby reducing fsyncs.

PROBLEM-2:
MySQL Group Commit already has a concept of ordering transactions based on the order of their addition to the GROUP COMMIT queue (FLUSH STAGE queue to be specific). Commit Monitor enforces the same, making the action redundant but limiting parallelism in MySQL Group Commit Logic (including redo-log flush that is now delayed to the flush stage).
With the existing flow (due to the involvement of Commit Monitor), only one transaction can enter the GROUP COMMIT Queue, thereby limiting optimal use of Group Commit Logic.
OPTIMIZATION-2:
Release the Commit Monitor once the transaction is successfully added to flush-stage of group-commit. MySQL will take it from there to maintain the commit ordering. (We call this interim-commit.)
Releasing the Commit Monitor early helps other transactions to make progress and real MySQL Group Commit Leader-Follower Optimization (batch flushing/sync/commit) comes into play.
This also helps ensure batch REDO log flushing.

PROBLEM-3:
This problem is specific to when the log-bin is disabled. Percona XtraDB Cluster still generates the log-bin, as it needs it for forming a replication write-set (it just doesn’t persist this log-bin information). If disk space is not a constraint, then I would suggest operating Percona XtraDB Cluster with log-bin enabled.
With log-bin disabled, OPTIMIZATION-1 is still relevant, but OPTIMIZATION-2 isn’t, as there is no group-commit protocol involved. Instead, MySQL ensures that the redo-log (capturing state change of transaction) is persisted before reporting COMMIT as a success. As per the original flow, the Commit Monitor is not released till the commit action is complete.
OPTIMIZATION-3:
The transaction is already committed to memory and the state change is captured. This is about persisting the REDO log only (REDO log modification is already captured by mtr_commit). This means we can release the Commit Monitor just before the REDO flush stage kicks in. Correctness is still ensured as the REDO log flush always persists the data sequentially. So even if trx-1 loses its slots before the flush kicks in, and trx-2 is allowed to make progress, trx-2’s REDO log flush ensures that trx-1’s REDO log is also flushed.

Conclusion
With these three main optimizations, and some small tweaks, we have tuned Percona XtraDB Cluster to scale better and made it fast enough for the growing demands of your applications. All of this is available with the recently released Percona XtraDB Cluster 5.7.17-29.20. Give it a try and watch your application scale in a multi-master environment, making Percona XtraDB Cluster the best fit for your HA workloads.

Performance improvements in Percona XtraDB Cluster 5.7.17-29.20

In our latest release of Percona XtraDB Cluster, we’ve introduced major performance improvements to the MySQLwrite-set replication layer. In this post, we want to show what these improvements look like.
For the test, we used the sysbench OLTP_RW, UPDATE_KEY and UPDATE_NOKEY workloads with 100 tables, 4mln rows each, which gives about 100GB of datasize. In all the tests we use a three-node setup, connected via a 10GB network, with the sysbench load directed to the one primary node.
In the first chart, we show improvements comparing to the previous version (5.7.16):

The main improvements come from concurrent workloads, under multiple threads.
The previous chart is for cases using enabled binary logs, but in some situations we will have deployments without binary logs enabled (Percona XtraDB Cluster does not require them). The latest release significantly improves performance for this case as well.
Here is a chart showing throughput without binary logs:

Where does Percona XtraDB Cluster place in comparison with similar technologies? To find out, we’ll compare this release with MySQL 5.7.17 Group Replication and with the recently released MariaDB 10.2.5 RC.
For MySQL 5.7.17 Group Replication, I’ve reviewed two cases: “durable” with sync_binlog=1, and “relaxed durability” with sync_binlog=0.
Also for MySQL 5.7.17 Group Replication, we want to review two cases with different flow_control settings. The first setting is flow_control=25000 (the default setting). It provides better performance, but with the drawbacks that non-primary nodes will fall behind significantly and MySQL Group Replication does not provide a way to protect from reading stale data. So with a default flow_control=25000, we risk reading very outdated data. We also tested MySQL Group Replication with flow_control=1000 to minimize stale data on non-primary nodes.
A note on the Flow Control topic: it is worth mentioning that we also changed the flow_control default for Percona XtraDB Cluster. The default value is 100 instead of 16 (as in version 5.7.16).
Comparison chart with sync_binlog=1 (for MySQL Group Replication):

Comparison chart with sync_binlog=0 (for MySQL Group Replication):

So there are couple conclusions we can make out of these charts.

The new version of Percona XtraDB Cluster performs on the level with MySQL Group Replication
flow_control for MySQl Group Replication really makes a difference for performance, and default flow_control=25000 is better (with the risk of a lot of outdated data on non-primary nodes)

The reference our benchmark files and config files are here.

Percona XtraDB Cluster 5.7.17-29.20 is now available

Percona announces the release of Percona XtraDB Cluster 5.7.17-29.20 on April 19, 2017. Binaries are available from the downloads section or our software repositories.
NOTE: You can also run Docker containers from the images in the Docker Hub repository.
Percona XtraDB Cluster 5.7.17-29.20 is now the current release, based on the following:

Percona Server 5.7.17-13
Galera Replication library 3.20
wsrep API version 29

All Percona software is open-source and free.
Performance Improvements
This release is focused on performance and scalability with increasing workload threads. Tests show up to 10 times increase in performance.
Fixed Bugs

Updated semantics for gcache page cleanup to trigger when either gcache.keep_pages_size or gcache.keep_pages_count exceeds the limit, instead of both at the same time.
Added support for passing the XtraBackup buffer pool size with the use-memory option under [xtrabackup] and the innodb_buffer_pool_size option under [mysqld] when the –use-memory option is not passed with the inno-apply-opts option under [sst].
Fixed gcache page cleanup not triggering when limits are exceeded.
Improved SST and IST log messages for better readability and unification.
Excluded the garbd node from flow control calculations.
Added extra checks to verify that SSL files (certificate, certificate authority, and key) are compatible before openning connection.
Improved parallelism for better scaling with multiple threads.
Added validations for DISCARD TABLESPACE and IMPORT TABLESPACE in PXC Strict Mode to prevent data inconsistency.
Added the wsrep_flow_control_status variable to indicate if node is in flow control (paused).

PXC-766: Added the wsrep_ist_receive_status variable to show progress during an IST.
Allowed CREATE TABLE … AS SELECT (CTAS) statements with temporary tables (CREATE TEMPORARY TABLE … AS SELECT) in PXC Strict Mode. For more information, see 1666899.

PXC-782: Updated xtrabackup-v2 script to use the tmpdir option (if it is set under [sst], [xtrabackup] or [mysqld], in that order).

PXC-783: Improved the wsrep stage framework.

PXC-784: Fixed the pc.recovery procedure to abort if the gvwstate.dat file is empty or invalid, and fall back to normal joining process. For more information, see 1669333.

PXC-794: Updated the sockopt option to include a comma at the beginning if it is not set by the user.

PXC-795: Set –parallel=4 as default option for wsrep_sst_xtrabackup-v2 to run four threads with XtraBackup.

PXC-797: Blocked wsrep_desync toggling while node is paused to avoid halting the cluster when running FLUSH TABLES WITH READ LOCK. For more information, see 1370532.

PXC-805: Inherited upstream fix to avoid using deprecated variables, such as INFORMATION_SCHEMA.SESSION_VARIABLE. For more information, see 1676401.

PXC-811: Changed default values for the following variables:

fc_limit from 16 to 100

send_window from 4 to 10

user_send_window from 2 to 4

Moved wsrep settings into a separate configuration file (/etc/my.cnf.d/wsrep.cnf).
Fixed mysqladmin shutdown to correctly stop the server on systems using systemd.
Fixed several packaging and dependency issues.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Percona XtraBackup 2.3.8 is Now Available

Percona announces the release of Percona XtraBackup 2.3.8 on April 18, 2017. Downloads are available from our download site or Percona Software Repositories.
Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.
This release is the current GA (Generally Available) stable release in the 2.3 series.
New Features

Percona XtraBackup now uses hardware accelerated implementation of crc32 where it is supported.

Percona XtraBackup has implemented new options: –tables-exclude and –databases-exclude that work similar to –tables and –databases options, but exclude given names/paths from backup.

The xbstream binary now supports parallel extraction with the –parallel option.

The xbstream binary now supports following new options: –decrypt, –encrypt-threads, –encrypt-key, and –encrypt-key-file. When –decrypt option is specified xbstream will automatically decrypt encrypted files when extracting input stream. Either –encrypt-key or –encrypt-key-file options must be specified to provide encryption key, but not both. Option –encrypt-threads specifies the number of worker threads doing the encryption, default is 1.

Bugs Fixed:

xtrabackup would not create fresh InnoDB redo logs when preparing incremental backup. Bug fixed #1669592.

xtrabackup –export can now export tables that have more than 31 index. Bug fixed #1089681.
Unrecognized character x01; marked by message could be seen if backups were taken with the version check enabled. Bug fixed #1651978.

Release notes with all the bugfixes for Percona XtraBackup 2.3.8 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569