Month: March 2018

ClickHouse and ProxySQL queries rewrite

ClickHouse and MySQL do not have the same SQL syntax and functions
ProxySQL is a popular open source, high performance and protocol-aware proxy server for MySQL and its forks.
Back in September 2017 we announced the introduction of ClickHouse as backend.
ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.
To support ClickHouse as a backend, ProxySQL acts as a data bridge between MySQL protocol and ClickHouse protocol, allowing MySQL clients to execute queries in ClickHouse through it.
When we started using it in real-world scenarios, we faced an expected issue: ClickHouse’s SQL query syntax is different than MySQL’s syntax, and migrating application from MySQL to ClickHouse isn’t just a matter of changing connections endpoint but it also requires modifying some queries. This needs development time, but not always possible.
One of ProxySQL most widely used feature is indeed the ability of rewriting queries, so often it is just a matter of writing the right query rules.
Let make an example.
We originally had this query:
SELECT COUNT(`id`), FROM_UNIXTIME(`created`, ‘%Y-%m’) AS `date` FROM `tablename` GROUP BY FROM_UNIXTIME(`created`, ‘%Y-%m’)
ClickHouse doesn’t support FROM_UNIXTIME, but it supports toDate and toTime.
ClickHouse also supports toYear and toMonth , useful to format the date the same FROM_UNIXTIME does.
Therefore, it is possible to rewrite the query as:
SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), ‘-‘, toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY toYear(toDate(created)), toMonth(toDate(created));
To perform the above rewrite, we will need two rules, one for the first FROM_UNIXTIME, and one for the second one. Or we can just use one rewrite rules to replace FROM_UNIXTIME(created, ‘%Y-%m’) no matter if on the retrieved fields or in the GROUP BY clause, generatinging the following query:
SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), ‘-‘, toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY concat(toString(toYear(toDate(created))), ‘-‘, toString(toMonth(toDate(created))));
Does it look great? No, not yet!
For the month of March, concat(toString(toYear(toDate(created))), ‘-‘, toString(toMonth(toDate(created)))) will return 2018-3 : not what the application was expecting, as MySQL would return 2018-03 . The same applies for all the first 9 months of each year.
Finally, we rewrote the query as the follow, and the application was happy:
SELECT COUNT(`id`), substring(toString(toDate(created)),1,7) AS `date`
FROM `tablename`
GROUP BY substring(toString(toDate(created)),1,7);
Note: because of the datatypes conversions that ClickHouse needs to perform in order to execute the above query, its execution time is about 50% slower than executing the following query:
SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), ‘-‘, toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY toYear(toDate(created)), toMonth(toDate(created));
Architecture using 2 ProxySQLs
Great, we now know how to rewrite the query!
Although, the ClickHouse module in ProxySQL doesn’t support query rewrite. The ClickHouse module in ProxySQL is only responsible to transform data between MySQL and ClickHouse protocol, and viceversa.
Therefore the right way of achieving this solution is to configure two ProxySQL layers, one instance responsible for rewriting the query and sending the rewritten query to the second ProxySQL instance, this one responsible for executing the query (already modified) on ClickHouse.
Architecture using only one ProxySQL
Does the above architecture seems complex? Not really, it is reasonable straightforward.
Can it be improved?
As you can see from the previous chart, the ClickHouse module and the MySQL module listen on different ports. The first ProxySQL instance is receiving traffic on port 6033, and sending traffic on the second PorxySQL instance on port 6090.
Are two instances really required? The answer is no.
In fact, a single instance can receive MySQL traffic on port 6033, rewrite the query, and send the rewritten query to itself on port 6090, to finally execute the rewritten query on ClickHouse.
This diagram describes the architecture:
For reference, below is the step to configure one single ProxySQL to send traffic to ClickHouse, and use itself as a backend.
Create ClickHouse user:
INSERT INTO clickhouse_users (username,password) VALUES (‘clicku’,’clickp’);
Create MySQL user (same as ClickHouse):
INSERT INTO mysql_users(username,password) SELECT username, password FROM clickhouse_users;
Configure ProxySQL itself as a backend for MySQL traffic:
INSERT INTO mysql_servers(hostname,port) VALUES (‘’,6090);
Create a query rule for rewriting queries:
INSERT INTO mysql_query_rules (active,match_pattern,replace_pattern,re_modifiers) VALUES
(1,”FROM_UNIXTIME\(`created`, ‘%Y-%m’\)”, ‘substring(toString(toDate(created)),1,7)’,”CASELESS,GLOBAL”);
This is a very simple example to demonstrate how to perform query rewrite from MySQL to ClickHouse using just one ProxySQL instance.
In a real world scenarios you will need to create more rules based on your own queries.
Not only ProxySQL allows to send queries to ClickHouse, but it also allows to rewrite queries to solve issues related to different SQL syntax and available functions.
To achieve this, ProxySQL uses its ability to use itself as a backend: rewrite the query in the MySQL module, and execute it in the ClickHouse module.

MySQL NDB Cluster Backups

Today – 31 March – is world backup day, so I thought I would write a little about backups in MySQL NDB Cluster.
Just because NDB Cluster offers built-in redundancy and high availability does not mean backups are not important. They way – as ever and as for everything in software. The redundancy does not protect against user errors (anyone ever executed DROP TABLE or DROP SCHEMA by accident?) neither does it protect against a natural disaster, fire, or another disaster hitting the data center. Similar with high availability.
In short, if the data is in any way remotely important for you, you ensure you have a backup. Furthermore, a backup is not worth any more than your ability to restore it. If a fire rages your data center, it does not help you have the best backup in the world hosted in that data center.
So, before actually creating and restoring a backup, let us look at two best practices when it comes to backups.
Best Practices
The best practices mentioned here are by no means unique to MySQL NDB Cluster nor even databases. They are not exhaustive either, but more meant as something guidelines to have in mind when designing your backups.
Use a Backup Method that Works with Your Product
It sounds pretty obvious – why would you ever use a backup solution that does not work? Obviously no one does that on purpose, but unfortunately it is too common that it has not been checked whether the backup solution is appropriate.
With respect to MySQL NDB Cluster, I can mention that rsync of the NDB file system will not work, neither will any other method of creating a binary backup from the file system (including MySQL Enterprise Backup). It does not work either to use mysqldump unless you keep the cluster read-only for example by putting the cluster into “single user mode” and locking all tables.
When you test your backups make sure that you make changes to the data while the backup is running. A backup method may work when the database is idle, but not when concurrent writes are occurring.
In a little bit, I will show what the recommended way to create an online backup in NDB Cluster is.
Ensure You Can Restore Your Backups
There are two parts to this: can you retrieve your backups even in the worst case scenario, and do you know how to restore your backups?
You cannot assume that a backup that is kept locally on the same host or even in the same data center will be available when you need it. Think in terms of a major disaster such as the entire data center gone. Is it likely to happen? Fortunately not, but from time to time really bad things happens: fires, earthquakes, flooding, etc. Even if it is a once a century event, do you want to run the risk?
So, ensure you are copying your backups off site. How far away you need to copy it depends on several factors, but at least ensure it is not in the same suburb.
The other aspect is that too often, the first time a restore is attempted is when there is a total outage and everyone is in panic mode. That is not the optimal time to learn about the restore requirements and gotchas. Make it routine to restore backups. It serves too purposes: it validates your backups – see also the previous best practice – and it validates your steps to restore a backup.
Creating a Backup
It is very easy to create an online backup of a cluster using MySQL NDB Cluster as it is built-in. In the simplest of cases, it is as trivial as to execute the START BACKUP command in the ndb_mgm client, for example:
shell$ ndb_mgm –ndb-connectstring=localhost:1186 \
> –execute=”START BACKUP”
Connected to Management Server at: localhost:1186
Waiting for completed, this may take several minutes
Node 1: Backup 1 started from node 49
Node 1: Backup 1 started from node 49 completed
StartGCP: 4970 StopGCP: 4973
#Records: 4025756 #LogRecords: 1251
Data: 120749052 bytes Log: 50072 bytes
Each backup has a backup ID. In the above example, the ID is 1 (“Backup 1 started from …”). When a backup is started without specifying a backup ID, MySQL NDB Cluster determines what the previously highest used ID is and adds one to that. However, while this is convenient, it does mean the backup ID does not carry any information other than the sequence the backups were made.
An alternative is to explicitly request a given ID. Supported IDs are 1 through 4294967294. One option is to choose the ID to be YYmmddHHMM where YY is the year, mm the month, dd the day, HH the hours in 24 hours format, and MM the minutes. Zero-padded the numbers if the value is less than 10. This makes the backup ID reflect when the backup was created.
To specify the backup ID explicitly specify the requested ID as the first argument after START BACKUP, for example (using the interactive mode of ndb_mgm this time):
ndb_mgm> START BACKUP 1803311603
Waiting for completed, this may take several minutes
Node 1: Backup 1803311603 started from node 49
Node 1: Backup 1803311603 started from node 49 completed
StartGCP: 5330 StopGCP: 5333
#Records: 4025756 #LogRecords: 1396
Data: 120749052 bytes Log: 55880 bytes
Here the backup ID is 1803311603 meaning the backup was created on 31 March 2018 at 16:03.
There are other arguments that can be used, for example to specify whether the snapshot time (where the backup is consistent) should be at the start of the end (the default) of the backup. The HELP START BACKUP command can be used to get online help with the START BACKUP command.

Remember that START BACKUP only backs up NDBCluster tables. Use mysqldump, mysqlpump, or another backup program to backup the schema and/or non-NDBCluster tables.

Restoring a Backup
It is a little more complicated to restore a backup than to create it, but once you have tried it a few times, it should not provide any major issues.
The backups are restored using the ndb_restore program. It is an NDB API program that supports both restoring the schema and data. It is recommended to perform the restore in three steps:

Restore the schema.
Restore the data with indexes disabled.
Rebuild the indexes.

In MySQL NDB Cluster 7.4 and later, restoring the schema with ndb_restore did not change the number of partitions to the default of the cluster you restore to. If you have not yet upgraded to MySQL NDB Cluster 7.5, it is recommended to restore the schema from a mysqldump or mysqlpump backup if the cluster does not have the same number of data nodes and LDM threads.

The restore examples assumes you are restoring into an empty cluster. There is also support for partial restores and renaming tables, but that will not be discussed here. Let us take a look at the three steps.
Step 1: Restore the Schema
The schema is restored using the –restore_meta option, for example:
shell$ ndb_restore –ndb-connectstring=localhost:1186 \
> –nodeid=1 –backupid=1803311603 \
> –backup_path=/backups/cluster/BACKUP/BACKUP-1803311603 \
> –restore_meta –disable-indexes
Nodeid = 1
Backup Id = 1803311603
backup path = /backups/cluster/BACKUP/BACKUP-1803311603
2018-03-31 16:28:07 [restore_metadata] Read meta data file header
Opening file ‘/backups/cluster/BACKUP/BACKUP-1803311603/BACKUP-1803311603.1.ctl’
File size 47368 bytes
Backup version in files: ndb-6.3.11 ndb version: mysql-5.7.21 ndb-7.5.9
2018-03-31 16:28:07 [restore_metadata] Load content
Stop GCP of Backup: 5332
2018-03-31 16:28:07 [restore_metadata] Get number of Tables
2018-03-31 16:28:07 [restore_metadata] Validate Footer
Connected to ndb!!
2018-03-31 16:28:08 [restore_metadata] Restore objects (tablespaces, ..)
2018-03-31 16:28:08 [restore_metadata] Restoring tables
Successfully restored table `world/def/country`

2018-03-31 16:28:11 [restore_data] Start restoring table data

NDBT_ProgramExit: 0 – OK
The arguments used here are:

–ndb-connectstring=localhost:1186. The host and port number where to connect to the management node(s). This example is from a test cluster with all nodes on the same host. In general you will not be specifying localhost here (never ever have the management and data nodes on the same host or even the same physical server – a topic for another day).

–nodeid=1. This tells which node ID to restore from. This is based on the node ID from the cluster where the backup was created. Either data node can be used.

–backupid=18033311603. The backup ID to restore.

–backup_path=…. The location of the backup files.

–restore_meta. Restore the schema (called meta data).

–disable-indexes. Do not restore the indexes (we will rebuild them later).

You may wonder why we do not want to restore the indexes. I will get back to that after the restore has been completed.
You should only execute this command once and only for one node id. Before proceeding to the next step, ensure the step completed without errors. The next step is to restore the data.
Step 2: Restore the Data
The command to restore the data is very similar to restoring the schema. The main differences is that –restore_meta will be replaced by –restore_data and that ndb_restore should be used once for each data node that was in the cluster where the backup was created.
For example in case of two data nodes:
shell$ ndb_restore –ndb-connectstring=localhost:1186 \
> –nodeid=1 –backupid=1803311603 \
> –backup_path=/dev/shm/backup/BACKUP/BACKUP-1803311603 \
> –restore_data –disable-indexes

shell$ ndb_restore –ndb-connectstring=localhost:1186 \
> –nodeid=2 –backupid=1803311603 \
> –backup_path=/dev/shm/backup/BACKUP/BACKUP-1803311603 \
> –restore_data –disable-indexes
These steps can be run in parallel as long as it does not cause an overload of the data nodes. A rule of thumb is that you can execute one ndb_restore –restore_data per host you have data nodes one. I.e. if you have one data node per host, you can restore all parts in parallel. If you have two data nodes per host, it may be necessary to divide the restore into two parts.
The final step is to rebuild the indexes.
Step 3: Rebuild the Indexes
As we disabled the indexes while restoring the schema and data, it is necessary to recreate them. This is done in a similar way to restoring the data – i.e. it should only be done for one node ID, for example:
shell$ ndb_restore –ndb-connectstring=localhost:1186 \
> –nodeid=1 –backupid=1803311603 \
> –backup_path=/dev/shm/backup/BACKUP/BACKUP-1803311603 \
> –rebuild-indexes
That’s it. You can use the data again. But why was it that the indexes where disabled? Let me return to that.
Why Disable Indexes During the Restore?
There are two reasons to disable the indexes while restoring the schema and data:

Constraints (unique indexes and foreign keys)

As such, it is only necessary to disable the indexes while restoring the data, but there is no reason to create the indexes during the schema restore just to remove them again in the next step.
By disabling the indexes, there is no need to maintain the indexes during the restore. This allows us to restore the data faster, but then we need to rebuild the indexes at the end. This is still faster though, and if BuildIndexThreads and the number of fragments per data node are greater than 1, the rebuild will happen in parallel like during a restart.
The second thing is that if you have unique keys or foreign keys, it is in general not possible to restore the backup with indexes enabled. The reason is that the backup happens in parallel across the data nodes with the changes happening during the backup recorded separately. When you restore the data, it is not possible to guarantee that data and log are restored in the same order as the changes occurred during the backup. So, to avoid unique key and foreign key errors, it is necessary to disable the indexes until after the data has been restored.
Do not worry – this does not mean that the restored data will be inconsistent. At the end of the backup – and rebuilding the indexes checks for this – the constraints are fulfilled again.
Want to Know More?
This blog really only scratches the surface of backups. If you want to read more, some references are:

The MySQL Reference Manual:

Online Backup of NDB Cluster
ndb_mgm — The NDB Cluster Management Client
Commands in the NDB Cluster Management Client
ndb_restore — Restore an NDB Cluster Backup

Pro MySQL NDB Cluster (Apress)
This book by Mikiya Okuno and myself has a chapter (31 pages) dedicated to discussing backups and restores.


Multi-Source Replication Performance with GTID

Multi-Source Replication with GTID

In this blog post, we’ll look at the performance of multi-source replication with GTID. Multi-Source Replication is a topology I’ve seen discussed recently, so I decided to look into how it performs with the different replication concepts. Multi-source replication use replication channels, which allow a slave to replicate from multiple masters. This is a great […]

The post Multi-Source Replication Performance with GTID appeared first on Percona Database Performance Blog.

MongoDB 3.6 Retryable Writes . . . Retryable Writes

MongoDB 3.6 retryable write

In this blog post, we will discuss MongoDB 3.6 Retryable Writes, a new application-level feature. Background From the beginning, MongoDB replica sets were designed to recover gracefully from many internal problems or events such as node crashes, network partitions/errors/interruptions, replica set member fail-overs, etc. While these events eventually recover transparently to the overall replica set, in […]

The post MongoDB 3.6 Retryable Writes . . . Retryable Writes appeared first on Percona Database Performance Blog.

Percona XtraBackup 2.4.10 Is Now Available

Percona_XtraBackup LogoVert_CMYK

Percona announces the GA release of Percona XtraBackup 2.4.10 on March 30, 2018. This release is based on MySQL 5.7.19. You can download it from our download site and apt and yum repositories. Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot […]

The post Percona XtraBackup 2.4.10 Is Now Available appeared first on Percona Database Performance Blog.

Analyze MySQL Audit Logs with ClickHouse and ClickTail

ClickHouse and ClickTail

In this blog post, I’ll look at how you can analyze MySQL audit logs (Percona Server for MySQL) with ClickHouse and ClickTail. Audit logs are available with a free plugin for Percona Server for MySQL ( Besides providing insights about activity on your server, you might need the logs for compliance purposes. However, on an active […]

The post Analyze MySQL Audit Logs with ClickHouse and ClickTail appeared first on Percona Database Performance Blog.

Using ProxySQL and VIRTUAL Columns to Solve ORM Issues

In this blog post, we’ll look at using ProxySQL and VIRTUAL columns to solve ORM issues.
There are a lot of web frameworks all around. Programmers and web designers are using them to develop and deploy any website and web application. Just to cite some of the most famous names: Drupal, Ruby on Rails, Symfony, etc.
Web frameworks are very useful tools. But sometimes, as with many human artifacts, they have issues. Any framework has its own queries to manage its internal tables. While there is nothing wrong with that, but it often means these queries are not optimized.
Here is my case with Symfony 2 on MySQL 5.7, and how I solved it.
The sessions table issue
Symfony has a table to manage session data for users on the application. The table is defined as follow:
CREATE TABLE `sessions` (
`sess_id` varchar(126) COLLATE utf8_bin NOT NULL,
`sess_data` blob NOT NULL,
`sess_time` int(10) unsigned NOT NULL,
`sess_lifetime` mediumint(9) NOT NULL,
PRIMARY KEY (`sess_id`)
The expiration time of the user session is configurable. The developers decided to configure it to be one month.
Symfony was serving a high traffic website, and very soon that table became very big. After one month, I saw it had more than 14 million rows and was more than 3GB in size.
-> FROM information_schema.tables WHERE table_schema=’symfony’ AND table_name=’sessions’\G
*************************** 1. row ***************************
TABLE_NAME: sessions
TABLE_ROWS: 14272158
DATA_LENGTH: 3306140672
Developers noticed the web application sometimes stalling for a few seconds. First, I analyzed the slow queries on MySQL and I discovered that sometimes Symfony deletes inactive sessions. It issued the following query, which took several seconds to complete. This query was the cause of the stalls in the application:
DELETE FROM sessions WHERE sess_lifetime + sess_time < 1521025847
The query is not optimized. Let’s have a look at the EXPLAIN:
mysql> EXPLAIN DELETE FROM sessions WHERE sess_lifetime + sess_time < 1521025847\G
*************************** 1. row ***************************
id: 1
select_type: DELETE
table: sessions
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 14272312
filtered: 100.00
Extra: Using where
Every DELETE query was a full table scan of more than 14 million rows. So, let’s try to improve it.
First workaround
Looking around on the web and discussing it with colleagues, we’ve found some workarounds. But none of them was the definitive solution:

Reduce expiration time in Symfony configuration. Good idea. One month is probably too long for a high traffic website. But we kept the expiration time configured at one month because of an internal business policy. But even one week wouldn’t have solved the full table scan.

Using a different database solution. Redis was proposed as an alternative to MySQL to manage session data. This might be a good solution, but it could involve a long deployment time. We planned a test, but the sysadmins suggested it was not a good solution to have another database system for such a simple task.

Patching Symfony code. It was proposed to rewrite the query directly into the Symfony code. Discarded.

Create indexes. It was proposed to create indexes on sess_time and sess_lifetime columns. The indexes wouldn’t get used because of the arithmetic addition on the where clause. This is the only condition we have on the query.

So, what do we do if everything must remain the same? Same configuration, same environment, same query issued and no indexes added?
Query optimization using a virtual column
I focused on how to optimize the query. Since I was using 5.7, I thought about a generated virtual column. I decided to add a virtual column in the sessions table, defined as sess_time+sess_lifetime (the same as the condition of the query):
mysql> ALTER TABLE sessions
ADD COLUMN `sess_delete` INT UNSIGNED GENERATED ALWAYS AS ((`sess_time` + `sess_lifetime`)) VIRTUAL;
Any virtual column can have an index on it. So, I created the index:
mysql> ALTER TABLE sessions ADD INDEX(sess_delete);
Note: I first checked that the INSERT queries were well written in Symfony (with an explicit list of the fields to insert), in make sure this modification wouldn’t cause more issues. Making a schema change on a table that is in use by any framework, where the queries against the table are generally outside of your control, can be a daunting task.
So, let’s EXPLAIN the query rewritten as follows, with the condition directly on the generated indexed column:
mysql> EXPLAIN DELETE FROM sessions WHERE sess_delete < 1521025847\G
*************************** 1. row ***************************
id: 1
select_type: DELETE
table: sessions
type: range
possible_keys: sess_delete
key: sess_delete
key_len: 5
ref: const
rows: 6435
filtered: 100.00
Extra: Using where
The query now can to use the index, and the number of rows selected are the exact number of the session that we have to delete.
So far, so good. But will Symfony execute that query if we don’t want to modify the source code?
Using ProxySQL to rewrite the query
Fortunately, we already had ProxySQL up and running in our environment. We were using it just to manage the master MySQL failover.
One of the very useful features of ProxySQL is the ability to rewrite any query it receives into another one based on rules you can define. You can create queries from very simple rules, like changing the name of a field, to very complex queries that use a chain of rules. It depends on how complex the translation is that you have to do. In our case, we just needed to translate sess_time + sess_lifetime into sess_delete. The rest of the query was the same. We needed to define a very simple rule.
Let’s see how to create the rewrite rules.
Connect to the proxy:
mysql -u admin -psecretpwd -h -P6032 –prompt=’Admin> ‘
Define the rewrite rule by inserting a record into the mysql_query_rules table:
Admin> INSERT INTO mysql_query_rules(rule_id,active,flagIN,match_pattern,negate_match_pattern,re_modifiers,replace_pattern,destination_hostgroup,apply)
-> 1,
-> 1,
-> 0,
-> ‘^DELETE FROM sessions WHERE sess_lifetime + sess_time < (.*)’,
-> 0,
-> ‘DELETE FROM sessions WHERE sess_delete < \1’,
-> 0,
-> 1);
The two fields I want to focus on are:

match_pattern: it defines the query to be matched using the regular expression notation. The + symbol must be escaped using because it’s a special character for regular expressions

replace_pattern: it defines how to rewrite the matched query. 1 is the value of the parameter matched by match_pattern into (.*)

For the meaning of the other fields, have a look at
Once created, we have to save the rule to disk and put it on runtime to let it run effectively.
After that, the proxy began to filter the query and rewrite it to have a better execution plan using the index on the virtual column.
Note: pay attention when you need to upgrade the framework. If it needs to rebuild the database tables, you will lose the virtual column you’ve created. Just remember to recreate it and check it after the upgrade.
Developers love using web frameworks because they are very powerful in simplifying development and deployment of complex web applications. But for DBAs, sometimes internal queries can cause a bit of a headache because it is not well optimized or because it was not supposed to run in your “huge” database. I solved my case using ProxySQL and VIRTUAL columns with a minimal impact on the architecture of the system we had and avoided any source code patching.
Take this post as a tip in case you face similar issues with your application framework.
The post Using ProxySQL and VIRTUAL Columns to Solve ORM Issues appeared first on Percona Database Performance Blog.

Approaching the Unacceptable Workload Boundary

layout: true


class: title
background-image: url(action-balance-fun-305250.jpg)
background-size: cover

Approaching the Unacceptable Workload Boundary
Baron Schwartz &bullet; SREcon18 Americas

class: img-right, bigger

Logistics & Stuff

Slides are at

Ask questions anytime.

Please get in touch: @xaprb or



class: bigger


What happens as systems get bigger and more heavily loaded?

* What is a system’s operating domain?

* How is load defined?

* Where is the load limit? How can you see it coming?

* How does the system behave near this limit?

* Can you measure and model this behavior?

background-image: url(nature-3258924-1280.jpg)
class: title


The Operating Domain


class: center, img-300h

Operating Domain and Failure Boundaries

Rasmussen’s model describes an operating domain bounded by economic risk, effort, and
safety. The system’s operating state is a point within the domain, always moving

background-image: url(rasmussens-model.jpg)

class: img-450h, center

The Actual Boundaries Are Unknown

class: img-450h, center

We Draw Limits Where We Think It’s Safe

Margin of Error
Overdraft Protection

class: img-450h, center, two-column

The Buffer Zone Is Nonlinear



We think the gradient looks like this.

It really looks more like this.


class: bigger

Complex Systems Run In Degraded Mode

Richard Cook lists 18 precepts of system failure in How Complex Systems
Precepts 4) and 5) are especially relevant.

4) Complex systems contain changing mixtures of failures latent within them.
The complexity of these systems makes it impossible for them to run without
multiple flaws being present.

5) Complex systems run in degraded mode.
A corollary to the preceding point is that complex systems run as broken systems.


Systems can and do function beyond their load limits.

class: title
background-image: url(gears-1236578-1280.jpg)


System Load


class: bigger

What Is The Definition Of Load?

There’s no one right answer to this question, but there’s a useful answer
for this discussion.

Load is the sum of task residence times during an observation interval
\(T\). This is equivalent to average concurrency of tasks queued or in

N = \frac{\sum_{}^{}{R}}{T}

You can prove this with Little’s Law.

class: bigger

Load, Utilization, And Queueing

Load (concurrency) is related to utilization and queue length, but it’s not
the same.

* Concurrency is the number of requests in process simultaneously.

* Average concurrency is an average over an observation interval \(T\).

* Utilization is the fraction of \(T\) that was busy.

* Queue length is the instantaneous or time-averaged number of tasks waiting
to be serviced.

class: bigger

Utilization, Queue Length, & Concurrency

By Little’s Law, utilization and queue length are types of concurrency.

Utilization is the concurrency of in-service tasks.

* Queue length is the concurrency of queued tasks.

class: two-column, bigger

What Is The Load Limit?

If the load limit were defined in terms of utilization, queueing theory could
tell us where the load limit will be.

But it can’t: load can be infinite, utilization ranges 0-1.

Plus it’s impractical:
* The “hockey stick” queueing curve is hard to use
* The “knee” is unintuitive



This is appealing because utilization has a clear limit: it can’t be more than

So we need to translate the problem to a different domain, where the units
match. Scalability is the answer.

class: title
background-image: url(snow-3260088-1280.jpg)




class: bigger

What’s the Definition of Scalability?

There’s a mathematical definition of scalability as a function of

I’ll illustrate it in terms of a parallel processing system that uses
concurrency to achieve speedup.

It’s practical, easy to use, and matches the domain well.

I’ll show how the equation is composed piece by piece, but don’t sweat the math.

class: bigger, img-center

Linear Scaling

Suppose a clustered system can complete X tasks per second with no

With parallelism, it divides tasks and executes subtasks
concurrently, completing tasks faster.

Faster completion also means increased throughput.

* Tasks per second is throughput.
* Throughput is a function of concurrency.

class: bigger, img-center

Linear Scaling

Ideally, throughput increases linearly with concurrency.

* Linear scaling is the ideal.
* Another way to say this is that the system’s output is a linear function of

class: two-column, bigger

The Linear Scalability Equation

The equation that describes ideal scaling is

X(N) = \frac{\lambda N}{1}

where the slope is \(\lambda=X(1)\).



– X is throughput
– N is concurrency, which is the workload
– Lambda is the system’s output when there’s no parallelism
– Really important to note that N is the independent parameter, the driver

class: center, bigger

But Our Cluster Isn’t Perfect

Linear scaling comes from subdividing tasks perfectly.

What if a portion isn’t subdividable?

class: two-column,bigger

Amdahl’s Law Describes Serialization

X(N) = \frac{\lambda N}{1+\sigma(N-1)}

Amdahl’s Law describes throughput when
a fraction \(\sigma\) can’t be



class: bigger

Amdahl’s Law Has An Asymptote

X(N) = \frac{\lambda N}{1+\sigma(N-1)}

Parallelism delivers speedup, but there’s a limit:

\lim_{N \to \infty}{X(N)} = \frac{1}{\sigma}

e.g. a 5% serialized task can’t be sped up more than 20-fold.

If 5% of the work is serialized, infinite concurrency will still result in tasks
taking 5% as long as non-parallelized tasks.

class: img-center, bigger

What If Workers Coordinate?

Suppose the parallel workers also have dependencies on each other?

class: two-column, bigger, img-center, img-300h

How Bad Is Coordination?

\(N\) workers = \(N(N-1)\) pairs of interactions, which is
\(\mathcal{O}(n^2)\) in \(N\).





class: two-column, bigger

The Universal Scalability Law

X(N) = \frac{\lambda N}{1+\sigma(N-1)+\kappa N(N-1)}

The USL adds a term for crosstalk, multiplied by the \(\kappa\)

Now there’s a point of diminishing returns!



Crosstalk is also called coordination or coherence.

class: bigger, img-center

You Already Know This

You’ve seen lots of benchmarks with diminishing returns.


By the way, pay attention to the axis scale, it’s log-scaled by powers of two.
If you scale the X-axis linearly you’ll get the shape of the curve on the
previous slide.

class: img-center, bigger, img-300h

The USL Describes Behavior Under Load

The USL explains the highly nonlinear behavior we know systems exhibit near
their saturation point.

– Serialization (red) grows slowly, but crosstalk (blue) grows rapidly.
– This is why systems get so unpredictable near their limits.
– Near and above the point of diminishing returns, systems exhibit high variance
and get unpredictable.

class: bigger

A Summary Of The USL

The Universal Scalability Law defines throughput as a function of concurrency.

It explains how and why systems don’t scale linearly with load.

class: bigger

What is the USL Good For?

Armed with the USL, you are ready to:

Measure and model nonlinear behavior.
Predict the onset of nonlinearity.
Design better systems.

It’s easy. Let’s see how!

class: title
background-image: url(compass-2946958_1280.jpg)


How To Measure, Model, And Predict


class: bigger

What To Measure

You can’t measure serialization & crosstalk directly.

Instead, measure throughput and concurrency.

Then fit the USL model to the data to estimate the parameters.

class: center, middle



Throughput is so trivially easy to measure in most systems that I won’t talk
about it. But there’s two easy ways to measure concurrency.

class: bigger

How To Measure Concurrency, Pt. 1

Many systems have a metric of concurrency already.
Look for a metric of things actively working.

MySQL: SHOW STATUS LIKE ‘Threads_running’

Apache: active worker count

It works well to poll this e.g. 1x/sec, then average these into 1- or
3-minute averages.

class: bigger

How To Measure Concurrency, Pt. 2

If there’s no metric of concurrency, you can sum up latencies and divide by
the duration.

N = \frac{\sum_{}^{}{R}}{T}

– Again, in my experience it’s good to use averages over a moderately long window like 1-5 minutes.
– You want to end up with dozens to hundreds of data points.

class: img-center, img-450h

Plot Your Data

Simply scatterplot your data and eyeball it for sanity.

Source data in “row16.csv” file. If you’re reading this note and you’re not a
VividCortex employee, sorry, I can’t give you access to this data.

class: bigger, img-450h

Plug The Data Into The Model

Paste the data into the Excel
model I built.

You can do it in R, or gnuplot, or even with JavaScript in Plotly. Lots of
options. This is an easy one.

class: bigger

Interpreting The Results

What does the output mean?

Shows whether your system has more serialization or crosstalk.

– Shows the estimated max load where it’ll stop scaling.

– Helps you predict nonlinearity.

class: bigger, img-center

Paypal’s NodeJS vs Java Benchmarks

Paypal’s NodeJS vs Java benchmarks are a good example!

class: bigger, img-300h, img-center

Bringing It Back To The Operating Domain

The USL is one way to understand what happens near this boundary.

class: bigger, two-column

What Happens Here?

– When the system approaches workload limits it gets twitchy.
– You may be able to see this approaching before it gets bad.
– Simply scatterplotting throughput vs concurrency is super useful!



class: two-column, bigger

You Don’t Need To Do Any Modeling!

Let’s take another look at this data. What jumps out?



class: two-column, bigger

What If You Had Only The First Part?

– I would model and project out to the right.
– I’d see “hmm, it’s leveling off.”
– I’d say “don’t count on much more than you see now.”



class: two-column, bigger

Think Differently About Outlying Points

– Given all the data, I mentally cluster it into two parts.
– If the high-end outliers deviate, it’s nonlinear already.
– Those points are evidence that the system is struggling there.
– You don’t need to model anything to see that.



class: center, two-column, bigger


Some Resources

I wrote a

I created an Excel

These slides are at



ProxySQL behind a load balancer in Google Cloud

In this article we will explore one approach for deploying ProxySQL behind a load balancer in Google Cloud.
While considering the deployment of ProxySQL, one has basically the following options:

Install ProxySQL on existing application server(s)
Provision dedicated ProxySQL server(s) between your application servers and the database layer.

Each approach has its pros and cons, but if there’s a significant number of application servers (more than a dozen or so) having a dedicated ProxySQL “layer” can be a more attractive option, specially if there is no service discovery mechanism in place (e.g. Consul).
Let’s consider a simple scenario, with a master and a small number of slaves in a single geographic region. Assuming that you are following the best practice, your database servers should be split into different availability zones. So for ProxySQL, it also makes sense to have at least 3 instances (again, on different availability zones).
Here’s how that would look:

ProxySQL behind a load balancer in Google Cloud

Getting started
Let’s start by creating some basic infrastructure for our POC from Google Cloud shell.
Network infrastructure
You can skip this part if you already have a network configuration in place.
1. Create a custom VPC network
gcloud compute networks create my-custom-network –subnet-mode custom
2. Create a new subnet in your custom VPC network
gcloud compute networks subnets create my-custom-subnet \
–network my-custom-network \
–range \
–region us-central1
3. Configure a firewall rule to allow all traffic within the subnet
gcloud compute firewall-rules create allow-all-10-240-0-0-16 \
–network my-custom-network \
–allow tcp,udp,icmp \
4. Create a firewall rule to allow ssh, mysql, icmp traffic from anywhere to the custom network (optional)
gcloud compute firewall-rules create allow-tcp22-tcp3306-icmp \
–network my-custom-network \
–allow tcp:22,tcp:3306,icmp
ProxySQL instances
Now let’s create some instances to install ProxySQL. I will skip the actual steps of installing and configuring ProxySQL for the sake of brevity. Check the official doc for more on this.
1. Create 3 ProxySQL instances on different zones
gcloud compute instances create tst-proxysql01 \
–image-family debian-9 \
–image-project debian-cloud \
–tags proxysql-lb \
–zone us-central1-a \
–subnet my-custom-subnet

gcloud compute instances create tst-proxysql02 \
–image-family debian-9 \
–image-project debian-cloud \
–tags proxysql-lb \
–zone us-central1-b \
–subnet my-custom-subnet

gcloud compute instances create tst–proxysql03 \
–image-family debian-9 \
–image-project debian-cloud \
–tags proxysql-lb \
–zone us-central1-c \
–subnet my-custom-subnet
Now we will create instance groups. One could set the group auto-scaling properties to better manage the instances but this is out of scope of this article.
2. Create 3 instance groups for ProxySQL instances on each zone
gcloud compute instance-groups unmanaged create us-proxysql-ig1 \
–zone us-central1-a

gcloud compute instance-groups unmanaged create us-proxysql-ig2 \
–zone us-central1-b

gcloud compute instance-groups unmanaged create us-proxysql-ig3 \
–zone us-central1-c
3. Add ProxySQL instances to the appropriate instance group
gcloud compute instance-groups unmanaged add-instances us-proxysql-ig1 \
–instances tst-proxysql01 \
–zone us-central1-a

gcloud compute instance-groups unmanaged add-instances us-proxysql-ig2 \
–instances tst-proxysql02 \
–zone us-central1-b

gcloud compute instance-groups unmanaged add-instances us-proxysql-ig3 \
–instances tst-proxysql03 \
–zone us-central1-c
ProxySQL behind a load balancer
Health checks
The first thing we need to configure is the health check(s). This is what will let the load balancer know which ProxySQL instances are “healthy”.
We could use a simple TCP check here, so when the TCP ACK is received the member is marked healthy. The problem is there have been (rare) cases of ProxySQL being unresponsive while the TCP ACK is still being returned by the operating system. So it is a better idea to check for an actual response string from ProxySQL.
I’ve noticed that ProxySQL returns a letter J on the first line of the response, so I’ve decided to use that in the response string to validate ProxySQL is alive. I’ve played a bit with more complex response strings but haven’t been able to make them work. If you come up with a better alternative, please let me know in the comments section!
I am using ProxySQL admin port for the health check, but any ProxySQL port will also work.
1. Configure the load balancer health checks for ProxySQL ports
gcloud compute health-checks create tcp my-proxysql-health-check \
–port 6032 \
Backend service
The next step is creating the backed and adding the instance groups to it.
I am using a session affinity setting, so all connections from a single application server are routed to the same ProxySQL instance. Feel free to take that parameter out.
2. Create the backend service
gcloud compute backend-services create my-proxysql-lb \
–load-balancing-scheme internal \
–region us-central1 \
–health-checks my-proxysql-health-check \
–protocol tcp \
3. Add the instance groups to backend
gcloud compute backend-services add-backend my-proxysql-lb \
–instance-group us-proxysql-ig1 \
–instance-group-zone us-central1-a \
–region us-central1

gcloud compute backend-services add-backend my-proxysql-lb \
–instance-group us-proxysql-ig2 \
–instance-group-zone us-central1-b \
–region us-central1

gcloud compute backend-services add-backend my-proxysql-lb \
–instance-group us-proxysql-ig3 \
–instance-group-zone us-central1-c \
–region us-central1
Forwarding rules
Now we need to create the forwarding rule of the load balancer. Note that if you don’t specify an IP address via the –address parameter, one will be auto-generated for you.
4. Create the forwarding rule
gcloud compute forwarding-rules create my-proxysql-lb-forwarding-rule \
–load-balancing-scheme internal \
–ports=”3306″ \
–network default \
–region us-central1 \
–backend-service my-proxysql-lb \
–subnet my-custom-subnet

Created [[PROJECT_ID]/regions/us-central1/forwardingRules/my-proxysql-lb-forwarding-rule].
IPProtocol: TCP
Firewall rules
We need some firewall rules so application servers are allowed to reach the ProxySQL servers. Note that we don’t need a specific rule for the load balancer IP address, the tag used for the backends is sufficient.
We also need a rule to allow health checks to happen. This requires whitelisting some Google-owned internal IP ranges.
1. Add a firewall rule to allow traffic to the load balancer, and from the load balancer to the backends
gcloud compute firewall-rules create allow-proxysql-lb \
–network default \
–source-ranges \
–target-tags proxysql-lb \
–allow tcp:3306
2. Add a firewall rule to allow the health checks to happen
gcloud compute firewall-rules create allow-proxysql-health-check \
–network default \
–source-ranges, \
–target-tags proxysql-lb \
–allow tcp:6032
Wrapping up
The next step is testing you can get to the ProxySQL instances via the load balancer.
First let’s check how do the backends look:
gcloud compute backend-services get-health my-proxysql-lb –region=us-central1

– healthState: HEALTHY
port: 80
kind: compute#backendServiceGroupHealth

– healthState: HEALTHY
port: 80
kind: compute#backendServiceGroupHealth

– healthState: HEALTHY
port: 80
kind: compute#backendServiceGroupHealth
I am not sure why port 80 is reported, but other than that, all backends seem to be healthy.
Now let’s try connecting MySQL client through the load balancer’s IP:
[root@tst-adm01 ~]# mysql -h -uapp_rw -p 
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5461327
Server version: 5.5.30 (ProxySQL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

mysql> select @@hostname;
| @@hostname |
| tst-mysql-01 |
1 row in set (0.05 sec)
You can see I was able to reach a MySQL server called tst-mysql-01, which I had previously configured in ProxySQL.
I hope you found this article useful for configuring ProxySQL behind a load balancer. If you are interested in learning more about ProxySQL, I suggest you also check out the following articles in Pythian blog:
The State Of MySQL High Availability Going In To 2018
Using ProxySQL To Validate MySQL Updates
If you have any suggestions/improvements please let me know in the comments section below.

TEL/電話+86 13764045638
QQ 47079569