Author: MySQL Performance Blog

Verifying Query Performance Using ProxySQL

In this blog post, we’ll look at how you can verify query performance using ProxySQL.
In the previous blog post, I showed you how many information can you get from the “stats.stats_mysql_query_digest” table in ProxySQL. I also mentioned you could even collect and graph these metrics. I will show you this is not just theory, it is possible.
These graphs could be very useful to understand the impact of the changes what you made on the query count or execution time.
I used our all-time favorite benchmark tool called Sysbench. I was running the following query:
UPDATE sbtest1 SET c=? WHERE k=?
There was no index on “k” when I started the test. During the test, I added an index. We expect to see some changes in the graphs.
I selected the “stats.stats_mysql_query_digest” into a file in every second, then I used Percona Monitoring and Management (PMM) to create graphs from the metrics. (I am going write another blog post on how can you use PMM to create graphs from any kind of metrics.)

Without the index, the update was running only 2-3 times per second. By adding the index, it went up to 400-500 hundred. We can see the results immediately on the graph.
Let’s see the average execution time:

Without the index, it took 600000-700000 microseconds, which is around 0.7s. By adding an index, it dropped to 0.01s. This is a big win, but most importantly we can see the effects on the query response time and query count if we are making some changes to the schema, query or configuration as well.
If you already have a ProxySQL server collecting and graphing these metrics, they could be quite useful when you are optimizing your queries. They can help make sure you are moving in the right direction with your tunings/modifications.

The Multi-Source GTID Replication Maze

In this blog post, we’ll look at how to navigate some of the complexities of multi-source GTID replication.
GTID replication is often a real challenge for DBAs, especially if this has to do with multi-source GTID replication. A while back, I came across a really interesting customer environment with shards where multi-master, multi-source, multi-threaded MySQL 5.6 MIXED replication was active. This is a highly complex environment that has both pros and cons, introducing risks as a trade-off for specific customer requirements.
This is the set up of part of this environment:

I started looking into this setup when a statement broke replication between db1 and db10. Replication broke due to a statement executed on a schema that was not present on db10. This also resulted in changes originating from db1 to not being pushed down to db100 as db10, as we stopped the replication thread (for db1 channel).

On the other hand, replication was not stopped on db2 because the schema in question was present on db2. Replication between db2 and db20 was broken as well because the schema was not present in db20.

In order to fix db1->db10 replication, four GTID sets were injected in db10.
Here are some interesting blog posts regarding how to handle/fix GTID replication issues:

After injecting the GTID sets, we started replication again and everything ran fine.

After that, we had to check the db2->db20 replication, which, as I’ve already said, was broken as well. In this case, injecting only the first GTID trx into db20 instead of all of those causing issues on db10 was enough!

You may wonder how this is possible. Right? The answer is that the rest of them were replicated from db10 to db20, although the channel was not the same.

Another strange thing is the fact that although the replication thread for the db2->db20 channel was stopped (broken), checking the slave status on db20 showed that Executed_Gtid_Set was moving for all channels even though Retrieved_Gtid_Set for the broken one was stopped! So what was happening there?
This raised my curiosity, so I decided to do some further investigation and created scenarios regarding other strange things that could happen. An interesting one was about the replication filters. In our case, I thought “What would happen in the following scenario … ?”
Let’s say we write a row from db1 to db123.table789. This row is replicated to db10 (let’s say using channel 1) and to db2 (let’s say using channel2). On channel 1, we filter out the db123.% tables, on channel2 we don’t. db1 writes the row and the entry to the binary log. db2 writes the row after reading the entry from the binary log and subsequently writes the entry to its own binary log and replicates this change to db20. This change is also replicated to db10. So now, on db10 (depending on which channel finds the GTID first) it either gets filtered on channel1 and written to its own bin log at just start…commit with any actual DDL/DML removed, or if it is read first on channel2 (db1->db2 and then db20->db10) then it is NOT filtered out and executed instead. Is this correct? It definitely ISN’T!
Points of interest
You can find answers to the above questions in the points of interest listed below. Although it’s not really clear through the official documentation, this is what happens with GTID replication and multi-source GTID replication:

As we know GTID sets are unique across all nodes in a given cluster. In multi-source replication, Executed_Gtid_Set is common for all channels. This means that regardless the originating channel, when a GTID transaction is executed it is recorded in all channels’ Executed_Gtid_Set. Although it’s logical (each database is unique, so if a trx is going to affect a database it shouldn’t be tightened to a single channel regardless of the channel it uses), the documentation doesn’t provide much info around this.
When we have multi-source, multi-level replication, there are cases where the GTID sets originating from one master can end up on one slave via different replication paths. It’s not clear if it applies any special algorithm (although it doesn’t seem that there could be one), but the preferred method seems to be FIFO. The fastest wins! This means that GTID sets can travel to the slave via different channels, and it’s related to how fast the upper-level slaves can commit changes. In fact, the path doesn’t really matter as it only executes each GTID trx once.
Replication filters are global regardless the channel. This means they apply each filter to all channels. This is normal as we can’t define a replication filter per channel. In order to be able to debug such cases, adding a small replication delay per channel seems a good idea.

Sneak Peek at Proxytop Utility

In this blog post, I’ll be looking at a new tool Proxytop for managing MySQL topologies using ProxySQL. Proxytop is a self-contained, real-time monitoring tool for ProxySQL. As some of you already know ProxySQL is a popular open source, high performance and protocol-aware proxy server for MySQL and its forks (Percona and MariaDB).
My lab uses MySQL and ProxySQL on Docker containers provided by Nick Vyzas. This lab also uses Alexey Kopytov’s Sysbench utility to perform benchmarking against ProxySQL.

Install Proxytop

Install Sysbench

Installation of Proxytop is pretty straightforward:
## You may first need to install system Python and MySQL dev packages
## e.g. “sudo apt install python-dev libmysqlclient-dev”
pip install MySQL-python npyscreen
wget -P /usr/bin
At this stage, we have everything we need to demonstrate Proxytop. The lab we have setup provides a bunch of bash scripts to demonstrate load for reruns. I’m using following script under the bin directory:
root@localhost docker-mysql-proxysql]# ./bin/docker-benchmark.bash
[Fri Feb 16 10:19:58 BRST 2018] Dropping ‘sysbench’ schema if present and preparing test dataset:mysql: [Warning] Using a password on the command line interface can be insecure.
[Fri Feb 16 10:19:58 BRST 2018] Running Sysbench Benchmarksi against ProxySQL:sysbench 1.0.12 (using bundled LuaJIT 2.1.0-beta2)
This script is totally customizable to benchmark as parameters can be tuned within the script:
Now let’s take a look at the Proxytop utility. It has menu driven style similarly to Innotop. Once you are in the tool, use [tab] to toggle between screens. Various shortcuts are also available to do things like changing sort order (‘s’), filter on specific criteria (‘l’) or changing the refresh interval for the view you are on (‘+’ / ‘-’).
Current, y it supports viewing the following aspects of a ProxySQL instance.

ConnPool – “ProxySQL Connection Pool” statistics
QueryRules – “ProxySQL Query Rules” statistics and definitions
GloStat – “ProxySQL Global Status” statistics
ProcList – “ProxySQL Processlist” for all incoming DML / DQL
ComCount – “ProxySQL Command Counter” statistics

We’ll go each of these screens in detail.
ConnPool Screen:
This screen basically shows the Connection Pool, specifically:

MySQL hostname and port
Assigned ProxySQL hostgroup
Connection statistics: Used / Free / OK / Error
MySQL Server state in ProxySQL i.e. ONLINE / OFFLINE / etc.
MySQL Server latency

Query Rules Screen:
This screen shows query rules and their use by count, and can be sorted either by rule_id or hits (ascending or descending) by cycling through the ordering list by pressing “s”.

It also allows you to view the actual definition of each rule by selecting and entering a rule. In the popup window, you will find a list of the relevant and defined columns for the query rule. For example:

If you have a lot of query rules defined, you can filter on a specific rule by pressing the letter “l”:

Global Statistics Screen: This screen shows Global Statistics from ProxySQL divided into four sections.

Connection Information
Prepared Statement Information
Command Information
Query Cache information

Proclist Screen: In this screen, we’re able to see running active queries with a minimum of a five-second refresh interval. In this way you can monitor long running queries in flight for troubleshooting:

ComCount Screen: This screen shows all command types executed with the total time and counts for each type, and also provides drill down to view the number of queries executed within specific ranges. This way type of workload can be easily identified both during testing and production:

You can drill down on each Com by using arrows and hitting enter key:

We all know the power of command line utilities such as proxysql-admin. The proxysql-admin utility is designed to be part of the configuration and ad-hoc monitoring of ProxySQL that is explained here in this blog post. Proxytop is designed to be menu driven to repeat commands in intervals. You can easily monitor and administer ProxySQL from the command line, but sometimes running recursive commands and monitoring over a period of time is annoying. This tool helps with that situation.

Visualize This! MySQL Tools That Explain Queries

In this blog post, I want to go over some of the day-to-day MySQL tools and methods DBAs use to analyze queries and visualize “what is going on?” I won’t be going into the nitty-gritty details of each of these tools, I just want to introduce you to them and show you what they look like so you will know what types of information they provide.
This isn’t a comprehensive list by any means but consider it a primer for those starting with MySQL and wanting to know what a query is going to do or is doing.
The two sides of query analysis are examining a query BEFORE you run it, and then analyzing what actually happened AFTER you run it.
Let’s start with the tools you can use to predict a query’s future.
In the beginning, there was EXPLAIN. The venerable EXPLAIN command has been with us a long time as a built-in MySQL utility statement. Its purpose is to explain that what the optimizer predicts is the best “plan”, and describe that to us. It tells us where data is gathered from, how it is filtered, and how it is combined or aggregated and much more:
> EXPLAIN SELECT CONCAT(customer.last_name, ‘, ‘, customer.first_name) AS customer,, film.title     FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id     INNER JOIN address ON customer.address_id = address.address_id     INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id     INNER JOIN film ON inventory.film_id = film.film_id     WHERE rental.return_date IS NULL     AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()     LIMIT 5G
*************************** 1. row ***************************
          id: 1
select_type: SIMPLE
      table: film
        type: ALL
possible_keys: PRIMARY
        key: NULL
    key_len: NULL
        ref: NULL
        rows: 1000
      Extra: NULL
*************************** 2. row ***************************
          id: 1
select_type: SIMPLE
      table: inventory
        type: ref
possible_keys: PRIMARY,idx_fk_film_id
        key: idx_fk_film_id
    key_len: 2
        rows: 1
      Extra: Using index
*************************** 3. row ***************************
          id: 1
select_type: SIMPLE
      table: rental
        type: ref
possible_keys: idx_fk_inventory_id,idx_fk_customer_id
        key: idx_fk_inventory_id
    key_len: 3
        ref: sakila.inventory.inventory_id
        rows: 1
      Extra: Using where
*************************** 4. row ***************************
          id: 1
select_type: SIMPLE
      table: customer
        type: eq_ref
possible_keys: PRIMARY,idx_fk_address_id
        key: PRIMARY
    key_len: 2
        ref: sakila.rental.customer_id
        rows: 1
      Extra: NULL
*************************** 5. row ***************************
          id: 1
select_type: SIMPLE
      table: address
        type: eq_ref
possible_keys: PRIMARY
        key: PRIMARY
    key_len: 2
        ref: sakila.customer.address_id
        rows: 1
      Extra: NULL
Looks a little cryptic though, doesn’t it? Other people thought so as well. Other MySQL tools were designed to help us get a better handle on what exactly the optimizer is planning to do.
Percona released pt-visual-explain to help us have a better representation:
+- Bookmark lookup
| +- Table
| | table         address
| | possible_keys PRIMARY
| +- Unique index lookup
|     key           address->PRIMARY
|     possible_keys PRIMARY
|     key_len       2
|     ref           sakila.customer.address_id
|     rows           1
  +- Bookmark lookup
  | +- Table
  | | table         customer
  | | possible_keys PRIMARY,idx_fk_address_id
  | +- Unique index lookup
  |     key           customer->PRIMARY
  |     possible_keys PRIMARY,idx_fk_address_id
  |     key_len       2
  |     ref           sakila.rental.customer_id
  |     rows           1
  +- JOIN
    +- Filter with WHERE
    | +- Bookmark lookup
    |     +- Table
    |     | table         rental
    |     | possible_keys idx_fk_inventory_id,idx_fk_customer_id
    |     +- Index lookup
    |       key           rental->idx_fk_inventory_id
    |       possible_keys idx_fk_inventory_id,idx_fk_customer_id
    |       key_len       3
    |       ref           sakila.inventory.inventory_id
    |       rows           1
    +- JOIN
        +- Index lookup
        | key           inventory->idx_fk_film_id
        | possible_keys PRIMARY,idx_fk_film_id
        | key_len       2
        | ref 
        | rows           1
        +- Table scan
          rows           1000
          +- Table
              table         film
              possible_keys PRIMARY
The output is justified as such:
pt-visual-explain reverse-engineers MySQL’s EXPLAIN output into a query execution plan, which it then formats as a left-deep tree – the same way the plan is represented inside MySQL. It is possible to do this by hand, or to read EXPLAIN’s output directly, but it requires patience and expertise. Many people find a tree representation more understandable.
The folks that develop MySQL Workbench tried to give an actual visual explanation in their tool. It creates graphics of the predicted workflow, and changes the colors of different steps to highlight expensive parts of the query:

And finally, MySQL itself can output extended EXPLAIN information and always does so if you use the JSON output format (introduced in MySQL 5.6):
> EXPLAIN format=JSON SELECT CONCAT(customer.last_name, ‘, ‘, customer.first_name) AS customer,, film.title     FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id     INNER JOIN address ON customer.address_id = address.address_id     INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id     INNER JOIN film ON inventory.film_id = film.film_id     WHERE rental.return_date IS NULL     AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()     LIMIT 5G
*************************** 1. row ***************************
“query_block”: {
  “select_id”: 1,
  “nested_loop”: [
      “table”: {
        “table_name”: “film”,
        “access_type”: “ALL”,
        “possible_keys”: [
        “rows”: 1000,
        “filtered”: 100
      “table”: {
        “table_name”: “inventory”,
        “access_type”: “ref”,
        “possible_keys”: [
        “key”: “idx_fk_film_id”,
        “used_key_parts”: [
        “key_length”: “2”,
        “ref”: [
        “rows”: 1,
        “filtered”: 100,
        “using_index”: true
      “table”: {
        “table_name”: “rental”,
        “access_type”: “ref”,
        “possible_keys”: [
        “key”: “idx_fk_inventory_id”,
        “used_key_parts”: [
        “key_length”: “3”,
        “ref”: [
        “rows”: 1,
        “filtered”: 100,
        “attached_condition”: “(isnull(`sakila`.`rental`.`return_date`) and ((`sakila`.`rental`.`rental_date` + interval `film`.`rental_duration` day) < <cache>(curdate())))”
      “table”: {
        “table_name”: “customer”,
        “access_type”: “eq_ref”,
        “possible_keys”: [
        “key”: “PRIMARY”,
        “used_key_parts”: [
        “key_length”: “2”,
        “ref”: [
        “rows”: 1,
        “filtered”: 100
      “table”: {
        “table_name”: “address”,
        “access_type”: “eq_ref”,
        “possible_keys”: [
        “key”: “PRIMARY”,
        “used_key_parts”: [
        “key_length”: “2”,
        “ref”: [
        “rows”: 1,
        “filtered”: 100
1 row in set, 1 warning (0.00 sec)
Don’t worry if EXPLAIN looks a little daunting. EXPLAIN can be hard to explain. Just know that it tells you what it thinks is going to happen and in what order. You can familiarize yourself with it as you go along.
Now let’s say you’ve executed your query and want to find out what actually happened. Or maybe you have a server that is running a lot of queries and you want to visualize what is going on with this server. It’s time to examine the tools we use to analyze AFTER running queries.
Similar to EXPLAIN, MySQL has tools built into it to help you understand what happened after a query was run. Query profiling tells us what the query spent its time doing. You can get a profile directly from the MySQL console:
> set profiling=1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

> SELECT CONCAT(customer.last_name, ‘, ‘, customer.first_name) AS customer,, film.title     FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id     INNER JOIN address ON customer.address_id = address.address_id     INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id     INNER JOIN film ON inventory.film_id = film.film_id     WHERE rental.return_date IS NULL     AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()     LIMIT 5;
| customer       | phone       | title           |
| HUEY, BRANDON | 99883471275 | ACE GOLDFINGER   |
| HANNON, SETH   | 864392582257 | AFRICAN EGG     |
| COLE, TRACY   | 371490777743 | ALI FOREVER     |
5 rows in set (0.00 sec)

> show profile;
| Status               | Duration |
| starting             | 0.000122 |
| checking permissions | 0.000006 |
| checking permissions | 0.000004 |
| checking permissions | 0.000003 |
| checking permissions | 0.000003 |
| checking permissions | 0.000005 |
| Opening tables       | 0.000026 |
| init                 | 0.000044 |
| System lock         | 0.000013 |
| optimizing           | 0.000020 |
| statistics           | 0.000058 |
| preparing           | 0.000024 |
| executing           | 0.000004 |
| Sending data         | 0.001262 |
| end                 | 0.000008 |
| query end           | 0.000006 |
| closing tables       | 0.000010 |
| freeing items       | 0.000021 |
| cleaning up         | 0.000014 |
19 rows in set, 1 warning (0.00 sec)
Or using performance_schema:
mysql> UPDATE performance_schema.threads SET INSTRUMENTED = ‘NO’
mysql> UPDATE performance_schema.setup_instruments SET ENABLED = ‘YES’, TIMED = ‘YES’
      WHERE NAME LIKE ‘%statement/%';

mysql> UPDATE performance_schema.setup_instruments SET ENABLED = ‘YES’, TIMED = ‘YES’
      WHERE NAME LIKE ‘%stage/%';      

mysql> UPDATE performance_schema.setup_consumers SET ENABLED = ‘YES’
      WHERE NAME LIKE ‘%events_statements_%';

mysql> UPDATE performance_schema.setup_consumers SET ENABLED = ‘YES’
      WHERE NAME LIKE ‘%events_stages_%';
> SELECT CONCAT(customer.last_name, ‘, ‘, customer.first_name) AS customer,, film.title     FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id     INNER JOIN address ON customer.address_id = address.address_id     INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id     INNER JOIN film ON inventory.film_id = film.film_id     WHERE rental.return_date IS NULL     AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()     LIMIT 5;
| customer       | phone       | title           |
| HUEY, BRANDON | 99883471275 | ACE GOLDFINGER   |
| HANNON, SETH   | 864392582257 | AFRICAN EGG     |
| COLE, TRACY   | 371490777743 | ALI FOREVER     |
5 rows in set (0.00 sec)    

mysql> SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SUBSTRING(SQL_TEXT,1,25) as sql_text
      FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like ‘%rental%';
| event_id | duration | sql_text                                               |
|       31 | 0.028302 | SELECT CONCAT(                             |
mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
      FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=31;
| Stage                         | Duration |
| stage/sql/starting             | 0.000080 |
| stage/sql/checking permissions | 0.000005 |
| stage/sql/Opening tables       | 0.027759 |
| stage/sql/init                 | 0.000052 |
| stage/sql/System lock         | 0.000009 |
| stage/sql/optimizing           | 0.000006 |
| stage/sql/statistics           | 0.000082 |
| stage/sql/preparing           | 0.000008 |
| stage/sql/executing           | 0.000000 |
| stage/sql/Sending data         | 0.000017 |
| stage/sql/end                 | 0.000001 |
| stage/sql/query end           | 0.000004 |
| stage/sql/closing tables       | 0.000006 |
| stage/sql/freeing items       | 0.000272 |
| stage/sql/cleaning up         | 0.000001 |
This can be helpful if a query plan looks “good” but things are taking too long. You can find out if your query is spending time locked or compiling statistics, etc.
You can also find out “how much” of something was going on for a given query by looking at the handler statistics:
Query OK, 0 rows affected (0.00 sec)

> SELECT CONCAT(customer.last_name, ‘, ‘, customer.first_name) AS customer,, film.title     FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id     INNER JOIN address ON customer.address_id = address.address_id     INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id     INNER JOIN film ON inventory.film_id = film.film_id     WHERE rental.return_date IS NULL     AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()     LIMIT 5;
| customer       | phone       | title           |
| HUEY, BRANDON | 99883471275 | ACE GOLDFINGER   |
| HANNON, SETH   | 864392582257 | AFRICAN EGG     |
| COLE, TRACY   | 371490777743 | ALI FOREVER     |
5 rows in set (0.00 sec)

> show status like ‘Handler%';
| Variable_name             | Value |
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock     | 10   |
| Handler_mrr_init           | 0     |
| Handler_prepare           | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 94   |
| Handler_read_last         | 0     |
| Handler_read_next         | 293   |
| Handler_read_prev         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next     | 33   |
| Handler_rollback           | 0     |
| Handler_savepoint         | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write             | 19   |
18 rows in set (0.00 sec)
This allows us a glimpse of how many times MySQL had to do certain things while running a query. For instance “Handler_read_rnd_next” was used 33 times. We can look up what this means in the documents and gain insight into what is happening.
These tools allow us to have a better understanding of what happened when a query was executed. But again they only help us analyze a single query. If you really want to see a bigger picture of an overall server workload and what queries are doing in production, you need to bring out the big guns.
Percona toolkit offers pt-query-digest. This tool ingests a slow query log from the server and analyzes it to your specifications. Its output has some visualization (tabulation) that gives you a better idea of what a server is spending its time doing, and offers a break down of the individual queries and real-world examples.
A top-level overview:
# 8.1s user time, 60ms system time, 26.23M rss, 62.49M vsz
# Current date: Thu Dec 29 07:09:32 2011
# Hostname:
# Files: slow-query.log.1
# Overall: 20.08k total, 167 unique, 16.04 QPS, 0.01x concurrency ________
# Time range: 2011-12-28 18:42:47 to 19:03:39
# Attribute         total     min     max     avg     95% stddev median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time             8s     1us   44ms   403us   541us     2ms   98us
# Lock time         968ms       0   11ms   48us   119us   134us   36us
# Rows sent       105.76k       0   1000   5.39   9.83   32.69       0
# Rows examine     539.46k       0 15.65k   27.52   34.95 312.56       0
# Rows affecte       1.34k       0     65   0.07       0   1.35       0
# Rows read       105.76k       0   1000   5.39   9.83   32.69       0
# Bytes sent       46.63M     11 191.38k   2.38k   6.63k 11.24k 202.40
# Merge passes           0       0       0       0       0       0       0
# Tmp tables         1.37k       0     61   0.07       0   0.91       0
# Tmp disk tbl         490       0     10   0.02       0   0.20       0
# Tmp tbl size     72.52M       0 496.09k   3.70k       0 34.01k       0
# Query size         3.50M     13   2.00k 182.86 346.17 154.34   84.10
# InnoDB:
# IO r bytes       96.00k       0 32.00k   20.86       0 816.04       0
# IO r ops               6       0       2   0.00       0   0.05       0
# IO r wait           64ms       0   26ms   13us       0   530us       0
# pages distin     28.96k       0     48   6.29   38.53   10.74   1.96
# queue wait             0       0       0       0       0       0       0
# rec lock wai           0       0       0       0       0       0       0
# Boolean:
# Filesort       4% yes, 95% no
# Filesort on   0% yes, 99% no
# Full scan     4% yes, 95% no
# QC Hit         0% yes, 99% no
# Tmp table     4% yes, 95% no
# Tmp table on   2% yes, 97% no
An individual query overview:
# Query 1: 0.26 QPS, 0.00x concurrency, ID 0x92F3B1B361FB0E5B at byte 14081299
# This item is included in the report because it matches –limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: |   _^   |
# Time range: 2011-12-28 18:42:47 to 19:03:10
# Attribute   pct   total     min     max     avg     95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         1     312
# Exec time     50     4s     5ms   25ms   13ms   20ms     4ms   12ms
# Lock time     3   32ms   43us   163us   103us   131us   19us   98us
# Rows sent     59 62.41k     203     231 204.82 202.40   3.99 202.40
# Rows examine 13 73.63k     238     296 241.67 246.02   10.15 234.30
# Rows affecte   0       0       0       0       0       0       0       0
# Rows read     59 62.41k     203     231 204.82 202.40   3.99 202.40
# Bytes sent   53 24.85M 46.52k 84.36k 81.56k 83.83k   7.31k 79.83k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables     0       0       0       0       0       0       0       0
# Tmp disk tbl   0       0       0       0       0       0       0       0
# Tmp tbl size   0       0       0       0       0       0       0       0
# Query size     0 21.63k     71     71     71     71       0     71
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait     0       0       0       0       0       0       0       0
# pages distin 40 11.77k     34     44   38.62   38.53   1.87   38.53
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Full scan   100% yes,   0% no
# String:
# Databases   wp_blog_one (264/84%), wp_blog_tw… (36/11%)… 1 more
# Hosts
# InnoDB trxID 86B40B (1/0%), 86B430 (1/0%), 86B44A (1/0%)… 309 more
# Last errno   0
# Users       wp_blog_one (264/84%), wp_blog_two (36/11%)… 1 more
# Query_time distribution
#   1us
# 10us
# 100us
#   1ms #################
# 10ms ################################################################
# 100ms
#   1s
# 10s+
# Tables
#   SHOW TABLE STATUS FROM `wp_blog_one ` LIKE ‘wp_options’G
#   SHOW CREATE TABLE `wp_blog_one `.`wp_options`G
SELECT option_name, option_value FROM wp_options WHERE autoload = ‘yes’G
Use it if you have a representative example of a server’s workload and you’re trying to understand what queries are the most poorly performing or executed most often. You can look for outliers that cause problems on occasion and more.
Finally, in the modern world, we want all this data aggregated together, visualized and easily accessible. Everything from explains to statistics to profiles to digests, and we want it all compiled in a nice neat package. Enter Percona Monitoring and Management (PMM) Query Analytics. (Screenshots are from PMM v1.7, other versions may look different.)
After setup and configuration, this tool offers us a comprehensive visual representation of the things we’ve discussed and much more.
PMM QAN is able to offer us a big picture look at the queries a server is executing, when they’re being run, what is taking up a lot of time, and what the variance is for a user defined time frame. It does this all at a glance by offering sparkline graphs (timelines) and variance represented graphically:
Remember when I spoke about Handlers and Profiling? PMM also offers us an aggregated picture of similar information server wide with human-readable terminology:

By selecting a single query, you can “drill down” and get lots of details about that specific query being run on your server:

Along with immediate access to Schema infomation and explain plans:

You can see PMM QAN in action (along with the rest of PMM’s features) at the demo site:
As you can see there are many ways we can use MySQL tools to help us visualize what MySQL is doing with a query. Knowing what tools are available and what they can show you about your query can be helpful. Some of these are very quick and easy to use, such as the built-in MySQL utility statements. Others like pt-toolkit or Workbench require installed software, and pt-query-digest usually needs a representative query log. PMM requires installation and configuration, but it provides the most detail and visualization.
MySQL tools we discussed:



MySQL Workbench

Query Profiling

Handler Statistics


PMM Query Analytics

Percona Toolkit 3.0.7 Is Now Available

Percona announces the release of Percona Toolkit 3.0.7 on March 1, 2018.
Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL, MariaDB®, Percona Server for MongoDB and MongoDB. Percona Toolkit, like all Percona software, is free and open source.
You can download packages from the website or install from official repositories.
This release includes the following changes:
New Features:

PT-633: Add the –mysql-only option to pt-stalk for RDS
We’ve improved data collection from Amazon RDS MySQL and Amazon Aurora MySQL to only perform remote mysqld checks. Previously we would also collect from the local system which was confusing.

Bug fixes:

PT-244: The –data-dir option of pt-online-schema-change is broken for partitioned table.

PT-1256: pt-table-sync does not use the character set for the table it is synchronizing

PT-1455: pt-osc is stuck when filtering out on the slave the table that is being altered

PT-1485:  The Security section of pt-mysql-summary is broken in versions greater than 5.6

PMM-1905:  Explain fails if it encounters a negative ntoreturn

Known Issues:

pt-online-schema-change will lock forever if using –drop-swap under MySQL 8.0.3-rc and 8.0.4-rc due to an error in MySQL:

pt-online-schema-change will lose FK constraints under MySQL 8.0.2-dmr, 8.0.3-rc 8.0.4-rc due to an error in MySQL:

pt-show-grants can’t handle MySQL 8 roles yet

How to Restore MySQL Logical Backup at Maximum Speed

The ability to restore MySQL logical backups is a significant part of disaster recovery procedures. It’s a last line of defense.
Even if you lost all data from a production server, physical backups (data files snapshot created with an offline copy or with Percona XtraBackup) could show the same internal database structure corruption as in production data. Backups in a simple plain text format allow you to avoid such corruptions and migrate between database formats (e.g., during a software upgrade and downgrade), or even help with migration from completely different database solution.
Unfortunately, the restore speed for logical backups is usually bad, and for a big database it could require days or even weeks to get data back. Thus it’s important to tune backups and MySQL for the fastest data restore and change settings back before production operations.
All results are specific to my combination of hardware and dataset, but could be used as an illustration for MySQL database tuning procedures related to logical backup restore.
There is no general advice for tuning a MySQL database for a bulk logical backup load, and any parameter should be verified with a test on your hardware and database. In this article, we will explore some variables that help that process. To illustrate the tuning procedure, I’ve downloaded IMDB CSV files and created a MySQL database with pyimdb.
You may repeat the whole benchmark procedure, or just look at settings changed and resulting times.

16GB – InnoDB database size
6.6GB – uncompressed mysqldump sql
5.8GB – uncompressed CSV + create table statements.

The simplest restore procedure for logical backups created by the mysqldump tool:
mysql -e ‘create database imdb;’
time mysql imdb < imdb.sql
# real 129m51.389s
This requires slightly more than two hours to restore the backup into the MySQL instance started with default settings.
I’m using the Docker image percona:latest – it contains Percona Server 5.7.20-19 running on a laptop with 16GB RAM, Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, two disks: SSD KINGSTON RBU-SNS and HDD HGST HTS721010A9.
Let’s start with some “good” settings: buffer pool bigger than default, 2x1GB transaction log files, disable sync (because we are using slow HDD), and set big values for IO capacity,
the load should be faster with big batches thus use 1GB for max_allowed_packet.
Values were chosen to be bigger than the default MySQL parameters because I’m trying to see the difference between the usually suggested values (like 80% of RAM should belong to InnoDB buffer pool).
docker run –publish-all –name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
time (mysql –max_allowed_packet=1G imdb1 < imdb.sql )
# real 59m34.252s
The load is IO bounded, and there is no reaction on set global foreign_key_checks=0 and unique_checks=0 because these variables are already disabled in the dump file.
How can we reduce IO?
Disable InnoDB double write: –innodb_doublewrite=0
time (mysql –max_allowed_packet=1G imdb1 < imdb.sql )
# real 44m49.963s
A huge improvement, but we still have an IO-bounded load.
We will not be able to improve load time significantly for IO bounded load. Let’s move to SSD:
time (mysql –max_allowed_packet=1G imdb1 < imdb.sql )
# real 33m36.975s
Is it vital to disable disk sync for the InnoDB transaction log?
sudo rm -rf mysql/*
docker rm p57
docker run -v /home/ihanick/Private/Src/tmp/data-movies/imdb.sql:/root/imdb.sql -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
–name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
# real 33m49.724s
There is no significant difference.
By default, mysqldump produces SQL data, but it could also save data to CSV format:
cd /var/lib/mysql-files
mkdir imdb
chown mysql:mysql imdb/
time mysqldump –max_allowed_packet=128M –tab /var/lib/mysql-files/imdb imdb1
# real 1m45.983s
sudo rm -rf mysql/*
docker rm p57
docker run -v /srv/ihanick/tmp/imdb:/var/lib/mysql-files/imdb -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
–name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
time (
mysql -e ‘drop database imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;’
(echo “SET FOREIGN_KEY_CHECKS=0;”;cat *.sql) | mysql imdb1 ;
for i in $PWD/*.txt ; do mysqlimport imdb1 $i ; done
# real 21m56.049s
1.5X faster, just because of changing the format from SQL to CSV!
We’re still using only one CPU core, let’s improve the load with the –use-threads=4 option:
time (
mysql -e ‘drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;’
(echo “SET FOREIGN_KEY_CHECKS=0;”;cat *.sql) | mysql imdb1
mysqlimport –use-threads=4 imdb1 $PWD/*.txt
# real 15m38.147s
In the end, the load is still not fully parallel due to a big table: all other tables are loaded, but one thread is still active.
Let’s split CSV files into smaller ones. For example, 100k rows in each file and load with GNU/parallel:
# /var/lib/mysql-files/imdb/
apt-get update ; apt-get install -y parallel
cd /var/lib/mysql-files/imdb
time (
cd split1
for i in ../*.txt ; do echo $i ; split -a 6 -l 100000 — $i `basename $i .txt`. ; done
for i in `ls *.*|sed ‘s/^[^.]+.//’|sort -u` ; do
mkdir ../split-$i
for j in *.$i ; do mv $j ../split-$i/${j/$i/txt} ; done
# real 2m26.566s
time (
mysql -e ‘drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;’
(echo “SET FOREIGN_KEY_CHECKS=0;”;cat *.sql) | mysql imdb1
parallel ‘mysqlimport imdb1 /var/lib/mysql-files/imdb/{}/*.txt’ ::: split-*
#real 16m50.314s
Split is not free, but you can split your dump files right after backup.
The load is parallel now, but the single big table strikes back with ‘setting auto-inc lock’ in SHOW ENGINE INNODB STATUSG
Using the –innodb_autoinc_lock_mode=2 option fixes this issue: 16m2.567s.
We got slightly better results with just mysqlimport –use-threads=4. Let’s check if hyperthreading helps and if the problem caused by “parallel” tool:

Using four parallel jobs for load: 17m3.662s
Using four parallel jobs for load and two threads: 16m4.218s

There is no difference between GNU/Parallel and –use-threads option of mysqlimport.
Why 100k rows? With 500k rows: 15m33.258s
Now we have performance better than for mysqlimport –use-threads=4.
How about 1M rows at once? Just 16m52.357s.
I see periodic flushing logs message with bigger transaction logs (2x4GB): 12m18.160s:
–innodb_buffer_pool_size=4GB –innodb_log_file_size=4G –skip-log-bin –innodb_flush_log_at_trx_commit=0 –innodb_io_capacity=700 –innodb_io_capacity_max=1500 –max_allowed_packet=1G –innodb_doublewrite=0 –innodb_autoinc_lock_mode=2 –performance-schema=0
Let’s compare the number with myloader 0.6.1 also running with four threads (myloader have only -d parameter, myloader execution time is under corresponding mydumper command):
# oversized statement size to get 0.5M rows in one statement, single statement per chunk file
mydumper -B imdb1 –no-locks –rows 500000 –statement-size 536870912 -o 500kRows512MBstatement
mydumper -B imdb1 –no-locks -o default_options
mydumper -B imdb1 –no-locks –chunk-filesize 128 -o chunk128MB
mydumper -B imdb1 –no-locks –chunk-filesize 64 -o chunk64MB
It will be great to test mydumper with CSV format, but unfortunately, it wasn’t implemented in the last 1.5 years:
Returning back to parallel CSV files load, even bigger transaction logs 2x8GB: 11m15.132s.
What about a bigger buffer pool: –innodb_buffer_pool_size=12G? 9m41.519s
Let’s check six-year-old server-grade hardware: Intel(R) Xeon(R) CPU E5-2430 with SAS raid (used only for single SQL file restore test) and NVMe (Intel Corporation PCIe Data Center SSD, used for all other tests).
I’m using similar options as for previous tests, with 100k rows split for CSV files load:
–innodb_buffer_pool_size=8GB –innodb_log_file_size=8G –skip-log-bin –innodb_flush_log_at_trx_commit=0 –innodb_io_capacity=700 –innodb_io_capacity_max=1500 –max_allowed_packet=1G –innodb_doublewrite=0 –innodb_autoinc_lock_mode=2

Single SQL file created by mysqldump loaded for 117m29.062s = 2x slower.
24 parallel processes of mysqlimport: 11m51.718s
Again hyperthreading making a huge difference! 12 parallel jobs: 18m3.699s.
Due to higher concurrency, adaptive hash index is a reason for locking contention. After disabling it with –skip-innodb_adaptive_hash_index: 10m52.788s.
In many places, disable unique checks referred as a performance booster: 10m52.489s
You can spend more time reading advice about unique_checks, but it might help for some databases with many unique indexes (in addition to primary one).
The buffer pool is smaller than the dataset, can you change old/new pages split to make insert faster? No: –innodb_old_blocks_pct=5 : 10m59.517s.
O_DIRECT is also recommended: –innodb_flush_method=O_DIRECT: 11m1.742s.
O_DIRECT is not able to improve performance by itself, but if you can use a bigger buffer pool: O_DIRECT + 30% bigger buffer pool: –innodb_buffeer_pool_size=11G: 10m46.716s.


There is no common solution to improve logical backup restore procedure.
If you have IO-bounded restore: disable InnoDB double write. It’s safe because even if the database crashes during restore, you can restart the operation.
Do not use SQL dumps for databases > 5-10GB. CSV files are much faster for mysqldump+mysql. Implement mysqldump –tabs+mysqlimport or use mydumper/myloader with appropriate chunk-filesize.
The number of rows per load data infile batch is important. Usually 100K-1M, use binary search (2-3 iterations) to find a good value for your dataset.
InnoDB log file size and buffer pool size are really important options for backup restore performance.
O_DIRECT reduces insert speed, but it’s good if you can increase the buffer pool size.
If you have enough RAM or SSD, the restore procedure is limited by CPU. Use a faster CPU (higher frequency, turboboost).
Hyperthreading also counts.
A powerful server could be slower than your laptop (12×2.4GHz vs. 4×2.8+turboboost).
Even with modern hardware, it’s hard to expect backup restore faster than 50MBps (for the final size of InnoDB database).
You can find a lot of different advice on how to improve backup load speed. Unfortunately, it’s not possible to implement improvements blindly, and you should know the limits of your system with general Unix performance tools like vmstat, iostat and various MySQL commands like SHOW ENGINE INNODB STATUS (all can be collected together with pt-stalk).
Percona Monitoring and Management (PMM) also provides good graphs, but you should be careful with QAN: full slow query log during logical database dump restore can cause significant processing load.
Default MySQL settings could cost you 10x backup restore slowdown
This benchmark is aimed at speeding up the restore procedure while the application is not running and the server is not used in production. Make sure that you have reverted all configuration parameters back to production values after load. For example, if you disable the InnoDB double write buffer during restore and left it enabled in production, you may have scary data corruption due to partial InnoDB pages writes.
If the application is running during restore, in most cases you will get an inconsistent database due to missing support for locking or correct transactions for restore methods (discussed above).

Percona Live 2018 Featured Talk – Scaling a High-Traffic Database: Moving Tables Across Clusters with Bryana Knight

Welcome to the first interview blog for the upcoming Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.
This blog post highlights Bryana Knight, Platform Engineer at GitHub. Her talk is titled Scaling a High-Traffic Database: Moving Tables Across Clusters. Facing an immediate need to distribute load, GitHub came up with creative ways to move a significant amount of traffic off of their main MySQL cluster – with no user impact. In our conversation, we discussed how Bryana and GitHub solved some of these issues:
Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?
Bryana: I started at GitHub as a full-stack engineer working on a new business offering, and was then shortly offered the opportunity to transition to the database services team. Our priorities back then included reviewing every single database migration for Having spent my whole career as a full-stack engineer, I had to level-up pretty quickly on MySQL, data modeling, data access patterns – basically everything databases. I spent the first few months learning our schema and setup through lots of reading, mentorship from other members of my team, reviewing migrations for most of our tables, and asking a million questions.
Originally, my team spent a lot of time addressing immediate performance concerns. Then we started partnering with product engineering teams to build out the backends for new features. Now we are focused on the longterm scalability and availability of our database, stemming from how we access it. I work right between our DBA’s and our product and API engineers.
Percona: Your talk is titled “Scaling a High-Traffic Database: Moving Tables Across Clusters”. What were the challenges GitHub faced that required redistributing your tables?
Bryana: This biggest part of the GitHub codebase is an 8-year-old monolith. As a company, we’ve been fortunate enough to see a huge amount of user growth since the company started. User growth means data growth. The schema and setup that worked for GitHub early on, and very much allowed GitHub to get to where it is today with tons of features and an extremely robust API, is not necessarily the right schema and setup for the size GitHub is today. 
We were seeing that higher than “normal” load was starting to have a more noticeable effect. The monolith aspect of our database, organic growth, plus inefficiencies in our code base were putting a lot of pressure on the master of our primary database cluster, which held our most core tables (think users, repos, permissions). From the database perspective, this meant contention, locking, and replica lag. From the user’s perspective, this meant anything from longer page loads to delays in UI updates and notifications, to timeouts. 
Percona: What were some of the other options you looked at (if any)?
Bryana: Moving tables out of our main cluster was not the only action we took to alleviate some of the pressure in our database. However, it was the highest impact change we could make in the medium-term to give us the breathing room we needed and improve performance and availability. We also prioritized efforts around moving more reads to replicas and off the master, throttling more writes where possible, index improvements and query optimizations. Moving these tables gave us the opportunity to start thinking more long-term about how we can store and access our data differently to allow us to scale horizontally while maintaining our healthy pace of feature development.
Percona: What were the issues that needed to be worked out between the different teams you mention in your description? How did they impact the project?
Bryana: Moving tables out of our main database required collaboration between multiple teams. The team I’m on, database-services, was responsible for coming up with the strategy to move tables without user impact, writing the code to handle query isolation and routing, connection switching, backgrounding writes, and so on. Our database-infrastructure team determined where the tables we were moving should go (new cluster or existing), setup the clusters, and advised us on how to safely copy the data. In some cases, we were able to use MySQL replication. When that wasn’t possible, they weighed in on other options. 
We worked with production engineers to isolate data access to these tables and safely split JOINs with other tables. Everybody needed to be sure we weren’t affecting performance and user experience when doing this. We discussed with our support team the risk of what we were doing. Then we worked with them to determine if we should preemptively status yellow when there was a higher risk of user impact. During the actual cut-overs, representatives from all these groups would get on a war-room-like video call and “push the button”, and we always made sure to have a roll-out and roll-back plan. 
Percona: Why should people attend your talk? What do you hope people will take away from it?
Bryana: In terms of database performance, there are a lot of little things you can do immediately to try and make improvements: things like adding indexes, tweaking queries, and denormalizing data. There are also more drastic, architectural changes you can pursue, that many companies need to do when they get to certain scale. The topic of this talk is a valid strategy that fits between these two extremes. It relieved some ongoing performance problems and availability risk, while giving us some breathing room to think long term. I think other applications and databases might be in a similar situation and this could work for them. 
Percona: What are you looking forward to at Percona Live (besides your talk)?
This is actually the first time I’m attending a Percona Live conference. I’m hoping to learn from some of the talks around scaling a high traffic database and sharding. I’m also looking forward to seeing some talks from the wonderful folks on GitHub database-infrastructure team.
Want to find out more about this Percona Live 2018 featured talk, and Bryana and GitHub’s migration? Register for Percona Live 2018, and see her talk Scaling a High-Traffic Database: Moving Tables Across Clusters. Register now to get the best price!
Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.
The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Update on Percona Platform Lifecycle for Ubuntu “Stable” Versions

This blog post highlights changes to the Percona Platform Lifecycle for Ubuntu “Stable” Versions.
We have recently made some changes to our Percona Platform and Software Lifecycle policy in an effort to more strongly align with upstream Linux distributions. As part of this, we’ve set our timeframe for providing supported builds for Ubuntu “Stable” (non-LTS) releases to nine (9) months. This matches the current Ubuntu distribution upstream policy.
In the future, we will continue to shift as necessary to match the upstream policy specified by Canonical. Along with this, as we did with Debian 9 before, we will only produce 64-bit builds for this platform ongoing. It has been our intention for some time to slowly phase out 32-bit builds, as they are rarely downloaded and largely unnecessary in contemporary times.
If you have any questions or concerns, please feel free to contact Percona Support or post on our Community Forums.

Want IST Not SST for Node Rejoins? We Have a Solution!

What if we tell you that there is a sure way to get IST not SST for node rejoins? You can guarantee that a new node rejoins using IST. Sound interesting? Keep reading.
Normally when a node is taken out of the cluster for a short period of time (for maintenance or shutdown), gcache on other nodes of the cluster help donate the missing write-set(s) when the node rejoins. This approach works if you have configured a larger gcache, or the downtime is short enough. Both approaches aren’t good, especially for a production cluster. Also, a larger gcache for the server lifetime means blocking larger disk-space when the same job can be done with relative smaller disk-space.
Re-configuring gcache, on a potential DONOR node before downtime requires a node shutdown. (Dynamic resizing of the gcache is not possible, or rather not needed now.) Restoring it back to original size needs another shutdown. So “three shutdowns” for a single downtime. No way …… not acceptable with busy production clusters and the possibility of more errors.
Introducing “gcache.freeze_purge_at_seqno”
Given the said pain-point, we are introducing gcache.freeze_purge_at_seqno Percona XtraDB Cluster 5.7.20. This controls the purging of the gcache, thereby retaining more data to facilitate IST when the node rejoins.
All the transactions in the Galera cluster world are assigned unique global sequence number (seqno). Tracking things happens using this seqno (like wsrep_last_applied, wsrep_last_committed, wsrep_replicated, wsrep_local_cached_downto, etc…). wsrep_local_cached_downto represents the sequence number down to which the gcache has been purged. Say wsrep_local_cached_downto = N, then gcache has data from [N, wsrep_replicated] and has purged data from [1,N).
gcache.freeze_purge_at_seqno takes three values:

-1 (default): no freeze, the purge operates as normal.

x (should be valid seqno in gcache): freeze purge of write-sets >= x. The best way to select x is to use the wsrep_last_applied value as an indicator from the node that you plan to shut down. (wsrep_applied * 0.09. Retain this extra 10% to trick the safety gap heuristic algorithm of IST.)

now: freeze purge of write-sets >= smallest seqno currently in gcache. Instant freeze of gcache-purge. (If tracing x (above) is difficult, simply use “now” and you are good).

Set this on an existing node of the cluster (that will continue to be part of the cluster and can act as potential DONOR). This node continues to retain the write-sets, thereby allowing the restarting node to rejoin using IST. (You can feed the said node as a preferred DONOR through wsrep_sst_donor while restarting the said rejoining node.)
Remember to set it back to -1 once the node rejoins. This avoids hogging space on the DONOR beyond the said timeline. On the next purge cycle, all the old retained write-sets are freed as well (reclaiming the space back to original).
To find out existing value of gcache.freeze_purge_at_seqno query wsrep_provider_options.
select @@wsrep_provider_options;
To set gcache.freeze_purge_at_seqno
set global wsrep_provider_options=”gcache.freeze_purge_at_seqno = now”;

Why should you use it?

gcache grows dynamically (using existing pagestore mechanism) and shrinks once the user sets it back to -1. This means you only use disk-space when needed.
No restart needed. The user can concentrate on maintenance node only.
No complex math or understanding of seqno involved (simply use “now”).
Less prone to error, as SST is one of the major error-prone areas with the cluster.

So why wait? Give it a try! It is part of Percona XtraDB Cluster 5.7.20 onwards, and helps you get IST not SST for node rejoins
Note: If you need more information about gcache, check here and here.

Tutorial Schedule for Percona Live 2018 Is Live

Percona has revealed the line-up of in-depth tutorials for the Percona Live 2018 Open Source Database Conference, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, Calif. Secure your spot now with Advanced Registration prices (available until March 4, 2018). Sponsorship opportunities for the conference are still available.
Percona Live 2018 Open Source Database Conference is the premier open source database event. The theme for the upcoming conference is “Championing Open Source Databases,” with a range of topics on MySQL, MongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations and Business/Case Studies. 
Tutorials take place throughout the day on April 23, 2018. Tutorials provide practical, in-depth knowledge of critical open source database issues. Topics include:

Amazon Aurora with PostgreSQL Compatibility – Jignesh Shah, Amazon Web Services
The Full MySQL (5.6, 5.7 and 8.0) and MariaDB (10.0 and 10.1) Parallel Replication Tutorial – Jean-Francois Gagne,; Eduardo Ortega,
Orchestrator High Availability  – Shlomi Noach, GitHub
MySQL Migrations – Gillian Gunson, GitHub; Brian Cain, DropBox; Mark Filipi, SurveyMonkey
InnoDB Tutorial for MySQL 8.0 – Peter Zaitsev, Percona
Percona XtraDB Cluster – Matthew Boehm, Percona
PostgreSQL 10 Highlights & Deep Dive – Robert Treat, OmniTI
MySQL InnoDB Cluster in a Nutshell – Frederic Descamps, Oracle; Kenny Gryp, Percona
Monitoring MySQL Performance with Percona Monitoring and Management (PMM) – Michael Coburn, Percona
MongoDB Basics Tutorial – Rick Golba and Adamo Tonete, Percona
MongoDB Advanced Tutorial – Barrett Chambers and David Murphy, Percona
MySQL Performance Schema in Action – Sveta Smirnova and Alexander Rubin, Percona
MySQL in a Nutshell – Dimitri Vanoverbeke and Tom De Cooman, Percona
Hands-On ProxySQL – Rene Cannao, ProxySQL; Derek Downey, Pythian

MySQL Break/Fix Lab – Pep Pla Sanz and Matthias Crauwels, Pythian

The Accidental DBA – Jenni Snyder, Yelp
Getting Started with Vitess – Sugu Sougoumarane, YouTube; Rafael Chacon, Slack

Hyatt Regency Santa Clara & The Santa Clara Convention Center
Percona Live 2018 Open Source Database Conference will be held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.
The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.
Book your hotel using Percona’s special room block rate!
Sponsorship opportunities for Percona Live 2018 Open Source Database Conference are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact for sponsorship details.

Diamond Sponsors – Continuent, VividCortex
Gold Sponsors – Facebook, Grafana
Bronze Sponsors – SolarWinds, TwinDB, Yelp
Media Sponsors – Datanami, EnterpriseTech, HPCWire,

TEL/電話+86 13764045638
QQ 47079569