Author: MySQL Performance Blog

Upcoming Webinar Wednesday August 16: Lock, Stock and Backup – Data Guaranteed

Join Percona’s, Technical Services Manager, Jervin Real as he presents Lock, Stock and Backup: Data Guaranteed on Wednesday, August 16, 2017 at 7:00 am PDT / 10:00 am EDT (UTC-7).
Reserve Your Spot

Backups are crucial in a world where data is digital and uptime is revenue. Environments are no longer bound to traditional data centers, and span multiple cloud providers and many heterogeneous environments. We need bulletproof backups and impeccable recovery processes. This talk aims to answer the question “How should I backup my MySQL databases?” by providing 3-2-1 backup designs, best practices and real-world solutions leveraging key technologies, automation techniques and major cloud provider services.
Register for the webinar here.

Jervin Real

As Technical Services Manager, Jervin partners with Percona’s customers on building reliable and highly performant MySQL infrastructures while also doing other fun stuff like watching cat videos on the internet. Jervin joined Percona in April 2010. Starting as a PHP programmer, Jervin quickly learned the LAMP stack. He has worked on several high-traffic sites and a number of specialized web applications (such as mobile content distribution). Before joining Percona, Jervin also worked with several hosting companies, providing care for customer hosted services and data on both Linux and Windows.

Learning MySQL 5.7: Q & A

In this post I’ll answer questions I received in my Wednesday, July 19, 2017, webinar Learning MySQL 5.7!
First, thank you all who attended the webinar. The link to the slides and the webinar recording can be found here.
I received a number of interesting questions in the webinar that I’ve followed up with below.
Would there be a big difference on passing from 5.1 to 5.6 before going to 5.7 or, at this point, would it be roughly the same?
The biggest risk of jumping between versions, in this case 5.1 to 5.6, is reverting in case of problems. Rollbacks don’t happen often, but they do happen and you have to make sure you have the infrastructure in place whenever you decide to execute. These upgrade steps are not officially supported by Oracle nor even recommended here at Percona. Having said that, as long as your tests (checksums, pt-upgrade) and rollback plan works, this shouldn’t be a problem.
One unforgettable issue I have personally encountered is an upgrade from 5.1 via dump and reload to 5.6. The 5.6 version ran with ROW binlog format preventing replication back to 5.1 because of the limitation with the TIMESTAMP columns. Similarly, downgrading without replication means you have to deal with changes to the MySQL system schema, which obviously require some form of downtime.
Additionally, replication from 5.7 to 5.5 will not work because of the additional metadata information that 5.7 creates (i.e., GTID even when GTID is disabled).
After in-place upgrade a Percona XtraDB Cluster from 5.5 to 5.7 (through 5.6), innodb_file_per_table is enabled by default and the database is now almost twice the size. It was a 40 GB DB now it’s 80 GB due to every table has its own file but ibdata1 is still 40 GB. Is there any solution for this (that doesn’t involve mysqldump and drop tables) and how can this be avoided in future upgrades?
The reason this might be the case is that after upgrading, a number (or possibly all) of tables were [re]created. This would obviously create separate tablespaces for each. One way I can think of reclaiming that disk space is through a familiar upgrade path:

Detach one of the nodes and make is an async replica of the remaining nodes in the cluster
Dump and reload data from this node, then resume replication
Join the other nodes from the cluster as additional nodes of a new cluster using the async replica
Once there is only one node remaining in the original cluster, you can switch to the new cluster for production
Rejoin the last node from the original cluster into the new cluster to complete the process

Depending on the semantics of your switch, it may or may not involve a downtime. For example, if you use ProxySQL this should be a transparent operation.
One way to avoid this problem is by testing. Testing the upgrade process in a lab will expose this kind of information even before deploying the new version into production, allowing you to adjust your process accordingly.
What is a possible impact on upgrades going from the old table format to Barracuda?
So far I am not aware of any negative impact – except if you upgrade and need to downgrade but have since created indexes with prefixes larger than what was supported on the previous version (see large_index_prefix and Barracuda documentation).
Upgrading to Barracuda and one of the supported row formats specifically allows memory constrained systems to save a little more. With BLOB/TEXT column stored off the page, they will not fill the buffer pool unless they are needed.
How do you run mysql_upgrade in parallel?
Good question, I actually wrote about it here.
Can you elaborate on ALTER progress features, and is it also applicable to “Optimization ” query?
I was not able to get more details on the “Optimization” part of this question. I can only assume this too was meant to be table rebuild via OPTIMIZE TABLE. First I would like to point out that OPTIMIZE has been an online DDL operation from 5.6 (with few limitations). As such, there is almost no point in monitoring. Also, for the cases where the online DDL does not apply to OPTIMIZE, under the hood, this is ALTER TABLE .. FORCE – a full table rebuild.
Now, for the actual ALTER process doing a table copy/rebuild, MySQL 5.7 provides some form of progress indication as to how much work has been done. However, it does not necessarily provide an estimate of the actual time it would take to complete. Each ALTER process has different phases which can vary under different conditions. Alternatively, you can also employ other ways of monitoring progress as described in the post.
We are migrated from 5.7.11 to 5.7.17 Percona Server and facing “Column 1 of table ‘x.x’ cannot be converted from type ‘varchar(100)’ to type ‘varchar(100)’”.
This is interesting – what we have seen so far are errors with different datatypes or sizes, which most likely means inconsistency from the table structures if the error is coming from replication. We will need more information on what steps were taken during the upgrade to tell what happened here. Our forums would be the best place to continue this conversation. To begin with, perhaps slave_type_conversions might help if the table structures in replication are the same.
Is the Boost Geometry almost on par with Postgres GIS functions?
I cannot answer this with authority or certainty. I’ve used GIS functions in MySQL, but not developed code for it. Although Boost::Geometry was chosen because of its well-designed API, rapid development and license compatibility, it does not necessarily mean it is more mature than PostGIS (which is widely adopted).
What is the best bulk insert method for MySQL 5.7?
The best option can be different in many situations, so we have to put context here. For this reason, let me give some example scenarios and what might work best:

On an upgrade process where you are doing a full dump and reload, parallelizing the process by using mydumper/myloader or mysqlpump will save a lot of time depending the hardware resource available.
Bulk INSERT from your application that happens at regular intervals – multi-row inserts are always ideal to reduce disk writes per insert. LOAD DATA INFILE is also a popular option if you can.

Again, thank you for attending the webinar – if you have additional questions head on out to the Percona Forums!

How to Configure Aurora RDS Parameters

In this blog post, we’ll look at some tips on how to configure Aurora RDS parameters.
I was recently deploying a few Aurora RDS instances, a process very similar to configuring a regular RDS instance. I noticed a few minor differences in the way you configure Aurora RDS parameters, and very few articles on how the commands should be structured (for RDS as well as Aurora). The only real literature available is the official Amazon RDS documentation.
This blog provides a concise “how-to” guide to quickly change Aurora RDS parameters using the AWS CLI. Aurora retains the parameter group model introduced with RDS, with new instances having the default read only parameter groups. For a new instance, you need to create and allocate a new parameter group (this requires a DB reboot). After that, you can apply changes to dynamic variables immediately. In other words, the first time you add the DB parameter group you’ll need to reboot even if the variable you are configuring is dynamic. It’s best to create a new DB parameter group when initializing your clusters. Nothing stops you from adding more than one host to the same DB Parameter Group rather than creating one per instance.
In addition to the DB Parameter Group, each instance is also allocated a DB Cluster Parameter Group. The DB Parameter Group is used for instance-level parameters, while the DB Cluster Parameter Group is used for cluster-level parameters (and applies to all instances in a cluster). You’ll find some of the MySQL engine variables can only be found in the DB Cluster Parameter Group. Here you will find a handy reference of all the DB cluster and DB instance parameters that are viewable or configurable for Aurora instances.
To run these commands, you’ll need to have the “aws” cli tool installed and configured. Note that the force-failover option used for RDS instances doesn’t apply to Aurora. You should perform either a controlled failover or let Aurora handle this. Also, the group family to use for Aurora is “oscar5.6”. The commands to set this up are as follows:
aws rds create-db-parameter-group
–db-parameter-group-name percona-opt
–db-parameter-group-family oscar5.6
–description “Percona Optimizations”
aws rds modify-db-parameter-group
–db-parameter-group-name percona-opt
–parameters “ParameterName=max_connections,ParameterValue=5000,ApplyMethod=immediate”
# For each instance-name:
aws rds modify-db-instance –db-instance-identifier <instance-name>
aws rds reboot-db-instance
–db-instance-identifier <instance-name>
Once you create the initial DB parameter group, configure the variables as follows:
aws rds modify-db-parameter-group
–db-parameter-group-name <instance-name>
–parameters “ParameterName=max_connect_errors,ParameterValue=999999,ApplyMethod=immediate”
aws rds modify-db-parameter-group
–db-parameter-group-name <instance-name>
–parameters “ParameterName=max_connect_errors,ParameterValue=999999,ApplyMethod=immediate”
## Verifying change:
aws rds describe-db-parameters
–db-parameter-group-name aurora-instance-1
| grep -B7 -A2 ‘max_connect_errors’
Please keep in mind, it can take a few seconds to propagate changes to nodes. Give it a moment before checking the values with “show global variables”. You can configure the DB Cluster Parameter group similarly, for example:
# Create a new db cluster parameter group
aws rds create-db-cluster-parameter-group –db-cluster-parameter-group-name percona-cluster –db-parameter-group-family oscar5.6 –description “new cluster group”
# Tune a variable on the db cluster parameter group
aws rds modify-db-cluster-parameter-group –db-cluster-parameter-group-name percona-cluster –parameters “ParameterName=innodb_flush_log_at_trx_commit,ParameterValue=2,ApplyMethod=immediate”
# Allocate the new db cluster parameter to your cluster
aws rds modify-db-cluster –db-cluster-identifier <cluster_identifier> –db-cluster-parameter-group-name=percona-cluster
# And of course, for viewing the cluster parameters
aws rds describe-db-cluster-parameters –db-cluster-parameter-group-name=percona-cluster
I hope you find this article useful, please make sure to share with the community!

Saturation Metrics in PMM 1.2.0

One of the new graphs added to Percona Monitoring and Management (PMM) is saturation metrics. This blog post explains how to use the information provided by these graphs.
You might have heard about Brendan Gregg’s USE Method  (Utilization-Saturation-Errors) as a way to analyze the performance of any system. Our goal in PMM is to support this method fully over time, and these graphs take us one step forward.
When it comes to utilization, there are many graphs available in PMM. There is the CPU Usage graph:

There is also Disk IO Utilization:

And there is Network Traffic:

If you would like to look at saturation type metrics, there is classical the Load Average graph:

While Load Average is helpful for understanding system saturation in general, it does not really distinguish whether it is the CPU or Disk that is saturated. Load Average, as the name says, is already averaged — so we can’t really observe short saturation spikes with Load Average. It is averaged for at least one minute. Finally, the problem with Load Average is it does not keep the number of CPU cores/threads into account. Suppose I have a CPU-bound Load Average of 16, for example. That is quite a load and will cause high saturation and queueing if you have two CPU threads. But if you have 64 threads, then 16 becomes a trivial load with no saturation at all.
Let’s take a look at the Saturation Metrics graph:

It provides us two metrics: one showing the CPU load and another is showing the IO load.These values roughly correspond to  the “r” and “b” columns in VMSTAT output:

These are sampled every second and then averaged over the reporting interval.
We also normalize the CPU load by dividing the raw number of runnable processes by a number of threads available. “Rocky” has 56 threads, which is why the normalized CPU load is about one even though the number of runnable processes shown by VMSTAT is around 50.
We do not normalize the IO load, as systems can have multiple IO devices and a number of requests they can handle in parallel is largely unknown. If you want to understand specific IO device performance, you should check out the Disk Performance Dashboard.
Testing Saturation Metrics in Practice
Let’s see if saturation graphs indeed show us when CPU saturation is the issue. I will use a sysbench CPU test for illustration, run as:
sysbench cpu  –cpu-max-prime=100000 –threads=1 –time=60 run
This will use the said number of threads to execute compute jobs, each of which will compute the said number of prime numbers. If we have enough CPU resources available, with no saturation, the latency of executing such requests should be about the same. When we overload the system, so there are not enough CPU execution units to process everything in the parallel, the average latency should increase.   
root@ts140i:/mnt/data# sysbench cpu  –cpu-max-prime=100000 –threads=1 –time=300 run sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Prime numbers limit: 100000
Initializing worker threads…
Threads started!
General statistics:
   total time:                          300.0234s
   total number of events:              12784
Latency (ms):
        min:                                 23.39
        avg:                                 23.47
        max:                                 28.07
        95th percentile:                     23.52
        sum:                             300018.06
As we can see with one thread working, the average time it takes to handle a single request is 23ms. Obviously, there is no saturation happening in this case:

“Ts140i” has four CPU cores, and as you can see the Normalized CPU load stays below one. You may wonder why isn’t it closer to 0.25 in this case, with one active thread and four cores available? The reason is at exactly the time when the metrics are being captured, there often happen to be an additional two to three threads active to facilitate the process. They are only active for a very few milliseconds at the time, so they do not produce much load — but they tend to skew the number a little bit.
Let’s now run with four threads. The number of threads matches the number of CPU cores available (and it is true cores in this case, no hyperthreading). In this case, don’t expect too much increase in the event processing time.
root@ts140i:/mnt/data# sysbench cpu  –cpu-max-prime=100000 –threads=4 –time=300 run
sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 4
Initializing random number generator from current time
Prime numbers limit: 100000
Initializing worker threads…
Threads started!
General statistics:
   total time:                          300.0215s
   total number of events:              48285
Latency (ms):
        min:                                 24.19
        avg:                                 24.85
        max:                                 43.61
        95th percentile:                     24.83
        sum:                            1200033.93
As you see test confirms the theory – we have avg latency increased just by about 6%  with Normalized CPU load in saturation metrics mostly hovering between 1 and 2:

Let’s now do the test with 16 threads, which is four times more than available CPU cores. We should see the latency dramatically increase due to CPU overload (or saturation). The same will happen to your CPU bound MySQL queries if you have more concurrency than CPUs available.
root@ts140i:/mnt/data# sysbench cpu  –cpu-max-prime=100000 –threads=16 –time=300 run
sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 16
Initializing random number generator from current time
Prime numbers limit: 100000
Initializing worker threads…
Threads started!
General statistics:
   total time:                          300.0570s
   total number of events:              48269
Latency (ms):
        min:                                 27.83
        avg:                                 99.44
        max:                                189.05
        95th percentile:                    121.08
        sum:                            4799856.52
We can see it takes about four times longer to process each request due to CPU overload and queueing. Let’s see what saturation metrics tell us:

As you can see, Normalized CPU Load floats between four and five on the graph, consistent with saturation we’re observing.
You may ask does the CPU utilization graph help us here? Not really. You will see 100% CPU usage for both the run with four threads and 16 threads, while request latencies are completely different.   
As we can see from our test, Normalized CPU Load is very helpful for understanding when the CPU is overloaded. An overloaded CPU causes response times to increase and performance to degrade. Furthermore, you can use it to (roughly) see how serious the overload is. As a rule of thumb, if you see Normalized CPU saturation over two, it indicates your CPUs are overloaded.

Platform End of Life (EOL) Announcement for RHEL 5 and Ubuntu 12.04 LTS

Upstream platform vendors have announced the general end of life (EOL) for Red Hat Enterprise Linux 5 (RHEL 5) and its derivatives, as well as Ubuntu 12.04 LTS. With this announcement comes some implications to support for Percona software running on these operating systems.
RHEL 5 was EOL as of March 31st, 2017 and Ubuntu 12.04 LTS was end of life as of April 28th, 2017. Pursuant to our end of life policies, we are announcing that these EOLs will go into effect for Percona software on August 1st, 2017. As of this date, we will no longer be producing new packages, binary builds, hotfixes, or bug fixes for Percona software on these platforms.
We generally align our platform end of life dates with those of the upstream platform vendor. The platform end of life dates are published in advance on our website under the page Supported Linux Platforms and Versions.
Per our policies, Percona will continue to provide operational support for your databases on EOLed platforms. However, we will be unable to provide any bug fixes, builds or OS-level assistance if you encounter an issue outside the database itself.
Each platform vendor has a supported migration or upgrade path to their next major release.  Please reach out to us if you need assistance in migrating your database to your vendor’s supported platform – Percona will be happy to assist you.

What is MySQL Partitioning?

In this blog, we’ll quickly look at MySQL partitioning.
Partitioning is a way in which a database (MySQL in this case) splits its actual data down into separate tables, but still get treated as a single table by the SQL layer.
When partitioning, it’s a good idea to find a natural partition key. You want to ensure that table lookups go to the correct partition or group of partitions. This means that all SELECT, UPDATE, DELETE should include that column in the WHERE clause. Otherwise, the storage engine does a scatter-gather, and queries ALL partitions in a UNION that is not concurrent.
Generally, you must add the partition key into the primary key along with the auto increment, i.e., PRIMARY KEY (part_id,id). If you don’t have well-designed and small columns for this composite primary key, it could enlarge all of your secondary indexes.
You can partition by range or hash. Range is great because you have groups of known IDs in each table, and it helps when querying across partition IDs. This still can create hotspots in the newest partition, as all new inserts go there. Partitioning by hash “load balances” the table, and allows you to write to partitions more concurrently. This makes range queries on the partition key a bad idea.
In MySQL 5.7, partitioning became native to the store engine and deprecated the old method where MySQL itself had to handle the partitions. This means InnoDB partitions (and a larger amount of partitions) are a better choice than in the past.
As with all features and recommendations, this only makes sense if it helps your data and workload!

What is innodb_autoinc_lock_mode and why should I care?

In this blog post, we’ll look at what innodb_autoinc_lock_mode is and how it works.
I was recently discussing innodb_autoinc_lock_mode with some colleagues to address issues at a company I was working with.
This variable defines the lock mode to use for generating auto-increment values. The permissible values are 0, 1 or 2 (for “traditional”, “consecutive” or “interleaved” lock mode, respectively). In most cases, this variable is set to the default of 1.
We recommend setting it to 2 when the BINLOG_FORMAT=ROW. With interleaved, INSERT statements don’t use the table-level AUTO-INC lock and multiple statements can execute at the same time. Setting it to 0 or 1 can cause a huge hit in concurrency for certain workloads.
Interleaved (or 2) is the fastest and most scalable lock mode, but it is not safe if using STATEMENT-based replication or recovery scenarios when SQL statements are replayed from the binary log. Another consideration – which you shouldn’t rely on anyway – is that IDs might not be consecutive with a lock mode of 2. That means you could do three inserts and expect IDs 100,101 and 103, but end up with 100, 102 and 104. For most people, this isn’t a huge deal.
If you are only doing simple inserts, this might not help you. I did a sysbench test on MySQL 5.7 in Amazon RDS with 100 threads and found no difference in performance or throughput between lock modes 1 and 2. It helps the most when you when the number of rows can’t be determined, such as with INSERT INTO…SELECT statements.
You can find a longer form article in the manual, but I highly recommend setting this value to 2 if you are not using STATEMENT-based replication.

Percona XtraBackup 2.4.8 is Now Available

Percona announces the GA release of Percona XtraBackup 2.4.8 on July 24, 2017. You can download it from our download site and apt and yum repositories.
Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.
New features:

To avoid issues with MySQL 5.7 skipping redo log for DDL Percona XtraBackup has implemented three new options (–lock-ddl, –lock-ddl-timeout, and –lock-ddl-per-table) that can be used to place MDL locks on tables while they are copied.
New –check-privileges option has been implemented that can be used to check if Percona XtraBackup has all required privileges to perform the backup.

Bugs Fixed:

xtrabackup would hang with Waiting for master thread to be suspended message when backup was being prepared. Bug fixed #1671437.

xtrabackup would fail to prepare the backup with 6th page is not initialized message in case server didn’t properly initialize the page. Bug fixed #1671722.

xbstream could run out of file descriptors while extracting the backup which contains many tables. Bug fixed #1690823.
When a table was created with the DATA DIRECTORY option xtrabackup would back up the .frm and .isl files, but not the .ibd file. Due to the missing .ibd files backup then could not be restored. Bug fixed #1701736.

Percona XtraBackup incorrectly determined use of master_auto_postion on a slave, and thus generated invalid xtrabackup_slave_info file. Bug fixed #1705193.

Percona XtraBackup will now print a warning if it encounters unsupported storage engine. Bug fixed #1394493.

Percona XtraBackup would crash while backing up MariaDB 10.2.x with –ftwrl-* options. Bug fixed #1704636.
xtrabackup –slave-info didn’t write the correct information into xtrabackup_slave_info file when multi-source replication was used. Bug fixed #1551634.
Along with xtrabackup_checkpints file, xtrabackup now copies xtrabackup_info file into directory specified by –extra-lsndir option. Bug fixed #1600656.
GTID position was not recorded when –binlog-info option was set to AUTO. Bug fixed #1651505.

Release notes with all the bugfixes for Percona XtraBackup 2.4.8 are available in our online documentation. Please report any bugs to the launchpad bug tracker.

Faster Node Rejoins with Improved IST performance

In this blog, we’ll look at how improvements to Percona XtraDB Cluster improved IST performance.
Starting in version 5.7.17-29.20 of Percona XtraDB Cluster significantly improved performance. Depending on the workload, the increase in throughput is in the range of 3-10x. (More details here). These optimization fixes also helped improve IST (Incremental State Transfer) performance. This blog is aimed at studying the IST impact.
IST stands for incremental state transfer. When a node of the cluster leaves the cluster for a short period of time and then rejoins the cluster it needs to catch-up with cluster state. As part of this sync process existing node of the cluster (aka DONOR) donates missing write-sets to rejoining node (aka JOINER). In short, flow involves, applying missing write-sets on JOINER as it does during active workload replication.
Percona XtraDB Cluster / Galera already can apply write-sets in parallel using multiple applier threads. Unfortunately, due to commit contention, the commit action was serialized. This was fixed in the above Percona XtraDB Cluster release, allowing commits to proceed in parallel.
IST uses the same path for applying write-sets, except that it is more like a batch operation.
IST Performance
Let’s look at IST performance before and now.

Two node cluster (node-1 and node-2) and gcache is configured large enough to avoid purging as we need IST
Start workload against node-1 for 30 seconds
Shutdown node-2
Start workload that performs 4M requests against node-1. Workload produces ~3.5M write-sets that are cached in gcache and used later for IST
Start node-2 with N-applier threads
Wait until IST is done
….. repeat steps 3-6 with different values of N.


IST is 4x faster with PXC-5.7.17 (compared to previous releases)
Improved performance means a quicker node rejoin, and an overall increase in cluster productivity as joiner node is available to process the workload more quickly

Percona XtraDB Cluster 5.7.17 significantly improved IST performance. A faster re-join of the node effectively means better cluster productivity and flexibility in planning maintenance window. So what are you waiting for? Upgrade to Percona XtraDB Cluster 5.7.17 or latest Percona XtraDB Cluster 5.7 release and experience the power!

Blog Poll: What Operating System Do You Run Your Development Database On?

In this post, we’ll use a blog poll to find out what operating system you use to run your development database servers.
In our last blog poll, we looked at what OS you use for your production database. Now we would like to see what you use for your development database.
As databases grow to meet more challenges and expanding application demands, they must try and get the maximum amount of performance out of available resources. How they work with an operating system can affect many variables, and help or hinder performance. The operating system you use for your database can impact consumable choices (such as hardware and memory). The operating system you use can also impact your choice of database engine as well (or vice versa).
When new projects, new applications or services or testing new architecture solutions, it makes sense to create a development environment in order to test and run scenarios before they hit production. Do you use the same OS in your development environment as you do your production environment?
Please let us know what operating system you use to run your development database. For this blog poll, we’re asking which operating system you use to actually run your development database server (not the base operating system).
If you’re running virtualized Linux on Windows, please select Linux as the OS used for development. Pick up to three that apply. Add any thoughts or other options in the comments section:
Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.
Thanks in advance for your responses – they will help the open source community determine how database environments are being deployed.

TEL/電話+86 13764045638
QQ 47079569