Author: MySQL Performance Blog

Percona Server for MySQL 5.5.60-38.12 Is Now Available

Percona announces the release of Percona Server for MySQL 5.5.60-38.12 on May 18, 2018. Based on MySQL 5.5.60, including all the bug fixes in it, Percona Server for MySQL 5.5.60-38.12 is now the current stable release in the 5.5 series.
Percona Server for MySQL is open-source and free. Downloads are available here and from the Percona Software Repositories.
Bugs Fixed

mysqldump utility with –innodb-optimize-keys option was incorrectly working with foreign keys pointing to the same table, producing invalid SQL statements. Bugs fixed #1125 and #3863.
A typo in plugin.cmake file prevented to compile plugins statically into the server. Bug fixed #3871 (upstream #89766).
Using -DWITHOUT_<PLUGIN>=ON CMake variable to exclude a plugin from the build didn’t work for some plugins, including a number of storage engines. Bug fixed #3901.
A fix was introduced to remove GCC 8 compilation warnings for the Percona Server build. Bug fixed #3950.
A code clean-up was done to fix compilation warnings and errors specific for clang 6. Bug fixed #3893 (upstream #90111).

Percona Server Debian packages description included reference to /etc/mysql/my.cnf file, which is not actually present in these packages. Bug fixed #2046.
A clean-up in Percona Server binlog-related code was made to avoid uninitialized memory comparison. Bug fixed #3925 (upstream #90238).
The temporary file I/O was not instrumented for Performance Schema. Bug fixed #3937 (upstream #90264).A key_block_size value was set automatically by the Improved MEMORY Storage Engine, which resulted in warnings when changing the engine type to InnoDB, and constantly growing key_block_size during alter operations. Bugs fixed  #3936, #3940, and #3943.

Other bugs fixed: #3767 “Fix compilation warnings/errors with clang”, #3778 “5.5 Tree received Percona-TokuBackup submodule where it should not”, #3794 “MTR test main.percona_show_temp_tables_stress does not wait for events to start”,  #3798 “MTR test innodb.percona_extended_innodb_status fails if InnoDB status contains unquoted special characters”, and #3926 “Potentially truncated bitmap file name in log_online_open_bitmap_file_read_only() (storage/innobase/log/log0online.cc)”.
Find the release notes for Percona Server for MySQL 5.5.60-38.12 in our online documentation. Report bugs in the Jira bug tracker.
The post Percona Server for MySQL 5.5.60-38.12 Is Now Available appeared first on Percona Database Performance Blog.

Installing MySQL 8.0 on Ubuntu 16.04 LTS in Five Minutes

Do you want to install MySQL 8.0 on Ubuntu 16.04 LTS? In this quick tutorial, I show you exactly how to do it in five minutes or less.
This tutorial assumes you don’t have MySQL or MariaDB installed. If you do, it’s necessary to uninstall them or follow a slightly more complicated upgrade process (not covered here).
Step 1: Install MySQL APT Repository
Ubuntu 16.04 LTS, also known as Xenial, comes with a choice of MySQL 5.7 and MariaDB 10.0.
If you want to use MySQL 8.0, you need to install the MySQL/Oracle Apt repository first:
wget https://dev.mysql.com/get/mysql-apt-config_0.8.10-1_all.deb
dpkg -i mysql-apt-config_0.8.10-1_all.deb
The MySQL APT repository installation package allows you to pick what MySQL version you want to install, as well as if you want access to Preview Versions. Let’s leave them all as default:

Step 2: Update repository configuration and install MySQL Server
apt-get update
apt-get install mysql-server
Note: Do not forget to run “apt-get update”, otherwise you can get an old version of MySQL from Ubuntu repository installed.
The installation process asks you to set a password for the root user:

I recommend you set a root password for increased security. If you do not set a password for the root account, “auth_socket” authentication is enabled. This ensures only the operating system’s “root” user can connect to MySQL Server without a password.
Next, the installation script asks you whether to use Strong Password Encryption or Legacy Authentication:

While using strong passwords is recommend for security purposes, not all applications and drivers support this new authentication method. Going with Legacy Authentication is a safer choice
All Done
You should have MySQL 8.0 Server running. You can test it by connecting to it with a command line client:

As you can see, it takes just a few simple steps to install MySQL 8.0 on Ubuntu 16.04 LTS.
Installing MySQL 8.0 on Ubuntu 16.04 LTS is easy. Go ahead give it a try!
The post Installing MySQL 8.0 on Ubuntu 16.04 LTS in Five Minutes appeared first on Percona Database Performance Blog.

Deploying PMM on Linode: Your $5-Per-Month Monitoring Solution

In this blog, I will show you how to install PMM on Linode as a low-cost database monitoring solution.
Many of my friends use Linode to run their personal sites, as well as small projects. While Linode is no match for Big Cloud providers in features, it is really wonderful when it comes to cost and simplicity: a Linode “nanode” instance offers 1GB of memory, 1 core, 20GB of storage and 1TB of traffic for just $5 a month.
A single Linode instance is powerful enough to use with Percona Monitoring and Management (PMM) to monitor several systems, so I use Linode a lot when I want to demonstrate PMM deployment through Docker, rather than Amazon Marketplace.
Here are step-by-step instructions to get you started with Percona Monitoring and Management (PMM) on Linode in five minutes (or less):
Step 1:  Pick the Linode Type, Location and launch it.

Step 2: Name your Linode
This step is optional and is not PMM-related, but you may want to give your Linode an easy-to-remember name instead of something like “linode7796908”. Click on Linode Name and then on “Settings” and enter a name in “Linode Label”.

Step 3:  Deploy the Image
Click on Linode Name and then on “Deploy an Image”.

I suggest choosing the latest Ubuntu LTS version and allocating 512MB for the swap file, especially on a Linode with a small amount of memory. Remember to set a strong root password, as Linode allows root password login by default from any IP.
Step 4: Boot Linode
Now prepare the image you need to boot your Linode. Click on the Boot button for that:

Step 5: Login to the system and install Docker
Use your favorite SSH client to login to the Linode you created using “root” user and password you set at Step 3, and install Docker:
apt install docker.io
Step 6: Run PMM Server
Here are detailed instructions to install the PMM Server on Docker. Below are the commands to do basic installation:
docker pull percona/pmm-server:latest
docker create
  -v /opt/prometheus/data
  -v /opt/consul-data
  -v /var/lib/mysql
  -v /var/lib/grafana
  –name pmm-data
  percona/pmm-server:latest /bin/true
docker run -d
  -p 80:80
  –volumes-from pmm-data
  –name pmm-server
  –restart always
  percona/pmm-server:latest
Note: This deploys PMM Server without authentication. For anything but test usage, you should set a password by following instructions on this page.
You’re done!
You’ve now installed PMM Server and you can see it monitoring itself by going to the server IP with a browser.

Now you can go ahead and install the PMM Client on the nodes you want to monitor!
The post Deploying PMM on Linode: Your $5-Per-Month Monitoring Solution appeared first on Percona Database Performance Blog.

How Binary Logs (and Filesystems) Affect MySQL Performance

I want to take a closer look at MySQL performance with binary logs enabled on different filesystems, especially as MySQL 8.0 comes with binary logs enabled by default.
As part of my benchmarks of the MyRocks storage engine, I’ve noticed an unusual variance in throughput for the InnoDB storage engine, even though we spent a lot of time making it as stable as possible in Percona Server for MySQL. In the end, the culprit was enabled binary logs. There is also always the question, “If there is a problem with EXT4, does XFS perform differently?” To answer that, I will repeat the same benchmark on the EXT4 and XFS filesystems.
You can find our previous experiments with binary logs here: https://www.percona.com/blog/2016/06/03/binary-logs-make-mysql-5-7-slower-than-5-6/.
Benchmark Setup
A short overview of the benchmark setup:

Percona Server for MySQL 5.7.21
InnoDB storage engine

In contrast to the previous benchmark, I enabled foreign keys, used REPEATABLE-READ isolation level, and I used UTF8 character sets. Because of these changes, the results are not really comparable with the previous results.

The dataset is the same: sysbench-tpcc with ten tables and 100 warehouses, resulting in a total of 1000 warehouses, and about a 90GB dataset size.

I will use innodb_buffer_pool_size 80GB, 70GB, and 60GB to emulate different IO loads and evaluate how that affects binary logs writes.

Initial Results
For the first run, let’s check the results without binary logs vs. with binary log enabled, but with sync_binlog=0:

We can see that results without binary logs are generally better, but we can also see that with binary logs enabled and sync_binglog=0, there are regular drops to 0 for 1-2 seconds. This basically results in stalls in any connected application.
So, enabling binary logs may result in regular application stalls. The reason for this is that there is a limit on the size of the binary log file (max_binlog_size), which is 1GB. When the limit is reached, MySQL has to perform a binary log rotation. With sync_binlog=0, all previous writes to the binary log are cached in the OS cache, and during rotation, MySQL forces synchronous flushing of all changes to disk. This results in complete stalls every ~40 seconds (the amount of time it takes to fill 1GB of binary log in the above tests).
How can we deal with this? The obvious solution is to enable more frequent sync writes of binary logs. This can be achieved by setting sync_binlog > 0. The popular choice is the most strict, sync_binlog=1, providing the most guarantees. The strict setting also comes with noted performance penalties. I will also test sync_binlog=1000 and sync_binlog=10000, which means perform synchronous writes of binary logs every 1000 and 10000 transactions, respectively.
The Results

The same results in a tabular format with median throughput (tps, more is better)

Bp sync_binlog
0
1
1000
10000
nobinlog

60 GB
4174.945
3598.12
3950.19
4205.165
4277.955

70 GB
5053.11
4541.985
4714
4997.875
5328.96

80 GB
5701.985
5263.375
5303.145
5664.155
6087.925

 
Some conclusions we can make:

sync_binlog=1 comes with the biggest performance penalty, but with minimal variance. This is comparable to running without binary logs.
sync_binlog=0 provides best (for enabled binary logs) performance, but the variance is huge.
sync_binlog=1000 is a good compromise, providing better performance than sync_binlog=1 with minimal variance.
sync_binlog=10000 might not be good, showing less variance than with 0, but it is still big.

So what value should we use? This is probably a choice between sync_binlog=1 or some value like 1000. It depends on your use case and your storage solution. In the case of slow storage, sync_binlog=1 may show a bigger penalty compared to what I can see on my enterprise SATA SSD SAMSUNG SM863.
Filesystems
All of the above results were on an EXT4 filesystem. Let’s compare to XFS. Will it show different throughput and variance?

The median throughput in tabular format:

sync_binlog
Buffer pool (GB)
EXT4
XFS

0
60
4174.945
3902.055

0
70
5053.11
4884.075

0
80
5701.985
5596.025

1
60
3598.12
3526.545

1
70
4541.985
4538.455

1
80
5263.375
5255.38

1000
60
3950.19
3620.05

1000
70
4714
4526.49

1000
80
5303.145
5150.11

10000
60
4205.165
3874.03

10000
70
4997.875
4845.85

10000
80
5664.155
5557.61

No binlog
60
4277.955
4169.215

No binlog
70
5328.96
5139.625

No binlog
80
6087.925
5957.015

 
We can observe the general trend that median throughput on XFS is a little worse than with EXT4, with practically identical variance.
The difference in throughput is minimal. You can use either XFS or EXT4.
Hardware Spec
Supermicro server:

Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
2 sockets / 28 cores / 56 threads
Memory: 256GB of RAM
Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
Filesystem: ext4/xfs
Percona-Server-5.7.21-20
OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic

Extra Raw Results, Scripts and Config

My goal is to provide fully repeatable benchmarks. To that effect, I’ve shared all the scripts and settings I used in the following GitHub repo:
https://github.com/Percona-Lab-results/201805-sysbench-tpcc-binlog-fs
The post How Binary Logs (and Filesystems) Affect MySQL Performance appeared first on Percona Database Performance Blog.

A Look at MyRocks Performance

In this blog post, I’ll look at MyRocks performance through some benchmark testing.
As the MyRocks storage engine (based on the RocksDB key-value store http://rocksdb.org ) is now available as part of Percona Server for MySQL 5.7, I wanted to take a look at how it performs on a relatively high-end server and SSD storage. I wanted to check how it performs for different amounts of available memory for the given database size. This is similar to the benchmark I published a while ago for InnoDB (https://www.percona.com/blog/2010/04/08/fast-ssd-or-more-memory/).
In this case, I plan to use a sysbench-tpcc benchmark (https://www.percona.com/blog/2018/03/05/tpcc-like-workload-sysbench-1-0/) and I will execute it for both MyRocks and InnoDB. We’ll use InnoDB as a baseline.
For the benchmark, I will use 100 TPC-C warehouses, with a set of 10 tables (to shift the bottleneck from row contention). This should give roughly 90GB of data size (when loaded into InnoDB) and is a roughly equivalent to 1000 warehouses data size.
To vary the memory size, I will change innodb_buffer_pool_size from 5GB to 100GB for InnoDB, and rocksdb_block_cache_size for MyRocks.
For MyRocks we will use LZ4 as the default compression on disk. The data size in the MyRocks storage engine is 21GB. Interesting to note, that in MyRocks uncompressed size is 70GB on the storage.
For both engines, I did not use FOREIGN KEYS, as MyRocks does not support it at the moment.
MyRocks does not support SELECT .. FOR UPDATE statements in REPEATABLE-READ mode in the Percona Server for MySQL implementation. However, “SELECT .. FOR UPDATE” is used in this benchmark. So I had to use READ-COMMITTED mode, which is supported.
The most important setting I used was to enable binary logs, for the following reasons:

Any serious production uses binary logs
With disabled binary logs, MyRocks is affected by a suboptimal transaction coordinator

I used the following settings for binary logs:

binlog_format = ‘ROW’
binlog_row_image=minimal
sync_binlog=10000 (I am not using 0, as this causes serious stalls during binary log rotations, when the  content of binary log is flushed to storage all at once)

While I am not a full expert in MyRocks tuning yet, I used recommendations from this page: https://github.com/facebook/mysql-5.6/wiki/my.cnf-tuning. The Facebook-MyRocks engineering team also provided me input on the best settings for MyRocks.
Let’s review the results for different memory sizes.
This first chart shows throughput jitter. This helps to understand the distribution of throughput results. Throughput is measured every 1 second, and on the chart I show all measurements after 2000 seconds of a run (the total length of each run is 3600 seconds). So I show the last 1600 seconds of each run (to remove warm-up phases):

To better quantify results, let’s take a look at them on a boxplot. The quickest way to understand boxplots is to take a look at the middle line. It represents a median of measurements (see more at https://www.percona.com/blog/2012/02/23/some-fun-with-r-visualization/):

Before we jump to the summary of results, let’s take a look at a variation of the throughput for both InnoDB and MyRocks. We will zoom to a 1-second resolution chart for 100 GB of allocated memory:

We can see that there is a lot of variation with periodical 1-second performance drops with MyRocks. At this moment, I do not know what causes these drops.
So let’s take a look at the average throughput for each engine for different memory settings (the results are in tps, and more is better):

Memory, GB
InnoDB
MyRocks

5
849.0664
4205.714

10
1321.9
4298.217

20
1808.236
4333.424

30
2275.403
4394.413

40
2968.101
4459.578

50
3867.625
4503.215

60
4756.551
4571.163

70
5527.853
4576.867

80
5984.642
4616.538

90
5949.249
4620.87

100
5961.2
4599.143

 
This is where MyRocks behaves differently from InnoDB. InnoDB benefits greatly from additional memory, up to the size of working dataset. After that, there is no reason to add more memory.
At the same time, interestingly MyRocks does not benefit much from additional memory.
Basically, MyRocks performs as expected for a write-optimized engine. You can refer to my article How Three Fundamental Data Structures Impact Storage and Retrieval for more details. 
In conclusion, InnoDB performs better (compared to itself) when the working dataset fits (or almost fits) into available memory, while MyRocks can operate (and outperform InnoDB) on small memory sizes.
IO and CPU usage
It is worth looking at resource utilization for each engine. I took vmstat measurements for each run so that we can analyze IO and CPU usage.
First, let’s review writes per second (in KB/sec). Please keep in mind that these writes include binary log writes too, not just writes from the storage engine.

Memory, GB
InnoDB
MyRocks

5
244754.4
87401.54

10
290602.5
89874.55

20
311726
93387.05

30
313851.7
93429.92

40
316890.6
94044.94

50
318404.5
96602.42

60
276341.5
94898.08

70
217726.9
97015.82

80
184805.3
96231.51

90
187185.1
96193.6

100
184867.5
97998.26

 
We can also calculate how many writes per transaction each storage engine performs:

This chart shows the essential difference between InnoDB and MyRocks. MyRocks, being a write-optimized engine, uses a constant amount of writes per transaction.
For InnoDB, the amount of writes greatly depends on the memory size. The less memory we have, the more writes it has to perform.
What about reads?
The following table shows reads in KB per second.

Memory, GB
InnoDB
MyRocks

5
218343.1
171957.77

10
171634.7
146229.82

20
148395.3
125007.81

30
146829.1
110106.87

40
144707
97887.6

50
132858.1
87035.38

60
98371.2
77562.45

70
42532.15
71830.09

80
3479.852
66702.02

90
3811.371
64240.41

100
1998.137
62894.54

 
We can translate this to the number of reads per transaction:

This shows MyRocks’ read-amplification. The allocation of more memory helps to decrease IO reads, but not as much as for InnoDB.
CPU usage
Let’s also review CPU usage for each storage engine. Let’s start with InnoDB:

The chart shows that for 5GB memory size, InnoDB spends most of its time in IO waits (green area), and the CPU usage (blue area) increases with more memory.
This is the same chart for MyRocks:

In tabular form:

Memory, GB
engine
us
sys
wa
id

5
InnoDB
8
2
57
33

5
MyRocks
56
11
18
15

10
InnoDB
12
3
57
28

10
MyRocks
57
11
18
13

20
InnoDB
16
4
55
25

20
MyRocks
58
11
19
11

30
InnoDB
20
5
50
25

30
MyRocks
59
11
19
10

40
InnoDB
26
7
44
24

40
MyRocks
60
11
20
9

50
InnoDB
35
8
38
19

50
MyRocks
60
11
21
7

60
InnoDB
43
10
36
10

60
MyRocks
61
11
22
6

70
InnoDB
51
12
34
4

70
MyRocks
61
11
23
5

80
InnoDB
55
12
31
1

80
MyRocks
61
11
23
5

90
InnoDB
55
12
32
1

90
MyRocks
61
11
23
4

100
InnoDB
55
12
32
1

100
MyRocks
61
11
24
4

 
We can see that MyRocks uses a lot of CPU (in us+sys state) no matter how much memory is allocated. This leads to the conclusion that MyRocks performance is limited more by CPU performance than by available memory.
MyRocks directory size
As MyRocks writes all changes and compacts SST files down the road, it would be interesting to see how the data directory size changes during the benchmark so we can estimate our storage needs. Here is a chart of datadirectory size:

We can see that datadirectory goes from 20GB at the start, to 31GB during the benchmark. It is interesting to observe the data growing until compaction shrinks it.
Conclusion
In conclusion, I can say that MyRocks performance increases as the ratio of dataset size to memory increases, outperforming InnoDB by almost five times in the case of 5GB memory allocation. Throughput variation is something to be concerned about, but I hope this gets improved in the future.
MyRocks does not require a lot of memory and shows constant write IO, while using most of the CPU resources.
I think this potentially makes MyRocks a great choice for cloud database instances, where both memory and IO can cost a lot. MyRocks deployments would make it cheaper to deploy in the cloud.
I will follow up with further cloud-oriented benchmarks.
Extras
Raw results, scripts and config
My goal is to provide fully repeatable benchmarks. To this end, I’m  sharing all the scripts and settings I used in the following GitHub repo:
https://github.com/Percona-Lab-results/201803-sysbench-tpcc-myrocks
MyRocks settings

rocksdb_max_open_files=-1
rocksdb_max_background_jobs=8
rocksdb_max_total_wal_size=4G
rocksdb_block_size=16384
rocksdb_table_cache_numshardbits=6
# rate limiter
rocksdb_bytes_per_sync=16777216
rocksdb_wal_bytes_per_sync=4194304
rocksdb_compaction_sequential_deletes_count_sd=1
rocksdb_compaction_sequential_deletes=199999
rocksdb_compaction_sequential_deletes_window=200000
rocksdb_default_cf_options=”write_buffer_size=256m;target_file_size_base=32m;max_bytes_for_level_base=512m;max_write_buffer_number=4;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=20;level0_stop_writes_trigger=30;max_write_buffer_number=4;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;compaction_pri=kMinOverlappingRatio;compression=kLZ4Compression;bottommost_compression=kLZ4Compression;compression_opts=-14:4:0″
rocksdb_max_subcompactions=4
rocksdb_compaction_readahead_size=16m
rocksdb_use_direct_reads=ON
rocksdb_use_direct_io_for_flush_and_compaction=ON

InnoDB settings

# files
 innodb_file_per_table
 innodb_log_file_size=15G
 innodb_log_files_in_group=2
 innodb_open_files=4000
# buffers
 innodb_buffer_pool_size= 200G
 innodb_buffer_pool_instances=8
 innodb_log_buffer_size=64M
# tune
 innodb_doublewrite= 1
 innodb_support_xa=0
 innodb_thread_concurrency=0
 innodb_flush_log_at_trx_commit= 1
 innodb_flush_method=O_DIRECT_NO_FSYNC
 innodb_max_dirty_pages_pct=90
 innodb_max_dirty_pages_pct_lwm=10
 innodb_lru_scan_depth=1024
 innodb_page_cleaners=4
 join_buffer_size=256K
 sort_buffer_size=256K
 innodb_use_native_aio=1
 innodb_stats_persistent = 1
 #innodb_spin_wait_delay=96
# perf special
 innodb_adaptive_flushing = 1
 innodb_flush_neighbors = 0
 innodb_read_io_threads = 4
 innodb_write_io_threads = 2
 innodb_io_capacity=2000
 innodb_io_capacity_max=4000
 innodb_purge_threads=4
 innodb_adaptive_hash_index=1

Hardware spec
Supermicro server:

CPU:

Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
2 sockets / 28 cores / 56 threads

Memory: 256GB of RAM
Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
Filesystem: ext4
Percona-Server-5.7.21-20
OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic

The post A Look at MyRocks Performance appeared first on Percona Database Performance Blog.

Check Out the Percona Live 2018 Live Stream!

Announcing the Percona Live 2018 live stream.
This year at Percona Live Open Source Database Conference 2018 we are live streaming the Keynote Talks on Day 1 and 2.
Percona is streaming the keynotes on Tuesday, April 24, 2018, and Wednesday, April 25, 2018 beginning at 9 AM PDT (both days). The keynote speakers include people from VividCortex, Upwork, Oracle, Netflix and many more. The keynote panels feature a cloud discussion and a cool technologies showcase.
Use the live stream link if you don’t want to miss a keynote, but can’t be at the main stage. The link for the live stream is:

The list of keynote talks and speakers for each day is:
Day 1

Percona Welcome – Laurie Coffin (Percona)

Open Source for the Modern Business – Peter Zaitsev (Percona)

Cool Technologies Showcase – Shuhao Wu (Shopify Inc.), Sugu Sougoumarane (PlanetScale Data), Nikolay Samokhvalov (Nombox), Andy Pavlo (Carnegie Mellon University)

State of the Dolphin 8.0 – Tomas Ulin (Oracle)

Linux Performance 2018 – Brendan Gregg (Netflix)

Day 2

Percona Welcome Back – Laurie Coffin (Percona)

Database Evolution in the Cloud Panel – Lixun Peng (Alibaba Cloud), Sunil Kamath (Microsoft), Baron Schwartz (VividCortex)

Future Perfect: The New Shape Of The Data Tier – Baron Schwartz (VividCortex)

MongoDB at Upwork – Scott Simpson – Upwork

The post Check Out the Percona Live 2018 Live Stream! appeared first on Percona Database Performance Blog.

Percona Toolkit 3.0.9 Is Now Available

Percona announces the release of Percona Toolkit 3.0.9 on April 20, 2018.
Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL, MariaDB®, Percona Server for MongoDB and MongoDB.
Percona Toolkit, like all Percona software, is free and open source. You can download packages from the website or install from official repositories.
This release includes the following changes:
New Tools:

PT-1501: pt-secure-collect – new tool to collect and sanitize pt-tools outputs

New Features:

PT-1530: Add support for encryption status to pt-mysql-summary

PT-1526: Add ndb status to pt-mysql-summary (Thanks Fernando Ipar)

PT-1525: Add support for MySQL 8 roles into pt-mysql-summary

PT-1509: Make pt-table-sync only set binlog_format when necessary (Thanks Moritz Lenz)

PT-1508: Add –read-only-interval and –fail-successive-errors flags to pt-heartbeat (Thanks Shlomi Noach)

PT-243: Add –max-hostname-length and –max-line-length flags to pt-query-digest

Bug Fixes:

PT-1527: Fixed pt-table-checksum ignores –nocheck-binlog-format

Improvements:

PT-1507: pt-summary does not reliably read in the transparent huge pages setting (Thanks Nick Veenhof)

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.
The post Percona Toolkit 3.0.9 Is Now Available appeared first on Percona Database Performance Blog.

Percona Monitoring and Management (PMM) 1.10.0 Is Now Available

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.
We focused mainly on two features in 1.10.0, but there are also several notable improvements worth highlighting:

Annotations – Record and display Application Events as Annotations using pmm-admin annotate
Grafana 5.0 – Improved visualization effects
Switching between Dashboards – Restored functionality to preserve host when switching dashboards
New Percona XtraDB Cluster Overview graphs – Added Galera Replication Latency graphs on Percona XtraDB Cluster Overview dashboard with consistent colors

The issues in the release include four new features & improvements, and eight bugs fixed.
Annotations
Application events are one of the contributors to changes in database performance characteristics, and in this release PMM now supports receiving events and displaying them as Annotations using the new command pmm-admin annotate. A recent Percona survey reveals that Database and DevOps Engineers highly value visibility into the Application layer.  By displaying Application Events on top of your PMM graphs, Engineers can now correlate Application Events (common cases: Application Deploys, Outages, and Upgrades) against Database and System level metric changes.

Usage
For example, you have just completed an Application deployment to version 1.2, which is relevant to UI only, so you want to set tags for the version and interface impacted:

pmm-admin annotate “Application deploy v1.2″ –tags “UI, v1.2″
Using the optional –tags allows you to filter which Annotations are displayed on the dashboard via a toggle option.  Read more about Annotations utilization in the Documentation.
Grafana 5.0
We’re extremely pleased to see Grafana ship 5.0 and we were fortunate enough to be at Grafanacon, including Percona’s very own Dimitri Vanoverbeke (Dim0) who presented What we Learned Integrating Grafana and Prometheus!

Why Analyze Raw MySQL Query Logs?

In this blog post, I’ll examine when looking at raw MySQL query logs can be more useful than working with tools that only have summary data.
In my previous blog post, I wrote about analyzing MySQL Slow Query Logs with ClickHouse and ClickTail. One of the follow-up questions I got is when do you want to do that compared to just using tools like Percona Monitoring and Management or VividCortex, which provide a beautiful interface for detailed analyses (rather than spartan SQL interface).    
MySQL Logs
A lot of folks are confused about what query logs MySQL has, and what you can use them for. First, MySQL has a “General Query Log”. As the name implies, this is a general-purpose query log. You would think this is the first log you should use, but it is, in fact, pretty useless:
2018-03-31T15:38:44.521650Z      2356 Query SELECT c FROM sbtest1 WHERE id=164802
2018-03-31T15:38:44.521790Z      2356 Query SELECT c FROM sbtest1 WHERE id BETWEEN 95241 AND 95340
2018-03-31T15:38:44.522168Z      2356 Query SELECT SUM(k) FROM sbtest1 WHERE id BETWEEN 1 AND 100
2018-03-31T15:38:44.522500Z      2356 Query SELECT c FROM sbtest1 WHERE id BETWEEN 304556 AND 304655 ORDER BY c
2018-03-31T15:38:44.522941Z      2356 Query SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 924 AND 1023 ORDER BY c
2018-03-31T15:38:44.523525Z      2356 Query UPDATE sbtest1 SET k=k+1 WHERE id=514
As you can see, it only has very limited information about queries: no query execution time or which user is running the query. This type of log is helpful if you want to see very clean, basic information on what queries your application is really running. It can also help debug MySQL crashes because, unlike other log formats, the query is written to this log file before MySQL attempts to execute the query.
The MySQL Slow Log is, in my opinion, much more useful (especially with Percona Server Slow Query Log Extensions). Again as the name implies, you would think it is only used for slow queries (and by default, it is). However, you can set long_query_time to 0 (with a few other options) to get all queries here with lots of rich information about query execution:
# Time: 2018-03-31T15:48:55.795145Z
# User@Host: sbtest[sbtest] @ localhost []  Id: 2332
# Schema: sbtest  Last_errno: 0 Killed: 0
# Query_time: 0.000143  Lock_time: 0.000047 Rows_sent: 1  Rows_examined: 1 Rows_affected: 0
# Bytes_sent: 188  Tmp_tables: 0 Tmp_disk_tables: 0  Tmp_table_sizes: 0
# QC_Hit: No  Full_scan: No Full_join: No  Tmp_table: No Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 0  InnoDB_IO_r_bytes: 0  InnoDB_IO_r_wait: 0.000000
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 0
# Log_slow_rate_type: query  Log_slow_rate_limit: 10
SET timestamp=1522511335;
SELECT c FROM sbtest1 WHERE id=2428336;
Finally, there is the MySQL Audit Log, which is part of the MySQL Enterprise offering and format-compatible Percona Server for MySQL Audit Log Plugin. This is designed for auditing access to the server, and as such it has matched details in the log. Unlike the first two log formats, it is designed first and foremost to be machine-readable and supports JSON, XML and CVS output formats:
{“audit_record”:{“name”:”Query”,”record”:”743017006_2018-03-31T01:03:12″,”timestamp”:”2018-03-31T15:53:42 UTC”,”command_class”:”select”,”connection_id”:”2394″,”status”:0,”sqltext”:”SELECT SUM(k) FROM sbtest1 WHERE id BETWEEN 3 AND 102″,”user”:”sbtest[sbtest] @ localhost []”,”host”:”localhost”,”os_user”:””,”ip”:””,”db”:”sbtest”}}
{“audit_record”:{“name”:”Query”,”record”:”743017007_2018-03-31T01:03:12″,”timestamp”:”2018-03-31T15:53:42 UTC”,”command_class”:”select”,”connection_id”:”2394″,”status”:0,”sqltext”:”SELECT c FROM sbtest1 WHERE id BETWEEN 2812021 AND 2812120 ORDER BY c”,”user”:”sbtest[sbtest] @ localhost []”,”host”:”localhost”,”os_user”:””,”ip”:””,”db”:”sbtest”}}
{“audit_record”:{“name”:”Query”,”record”:”743017008_2018-03-31T01:03:12″,”timestamp”:”2018-03-31T15:53:42 UTC”,”command_class”:”select”,”connection_id”:”2394″,”status”:0,”sqltext”:”SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 1 AND 100 ORDER BY c”,”user”:”sbtest[sbtest] @ localhost []”,”host”:”localhost”,”os_user”:””,”ip”:””,”db”:”sbtest”}}
As you can see, there are substantial differences in the purposes of the different MySQL log formats, along with the information they provide.
Why analyze raw MySQL query logs
In my opinion, there are two main reasons to look directly at raw log files without aggregation (you might find others):

Auditing, where the Audit Log is useful (Vadim recently blogged about it)

Advanced MySQL/application debugging, where an aggregated summary might not allow you to drill down to the fullest level of detail

When you’re debugging using MySQL logs, the Slow Query Log, set to log all queries with no sampling, is the most useful. Of course, this can cause significant additional overhead in many workloads, so it is best to do it in a development environment (if you can repeat the situation you’re looking to analyze). At the very least, don’t do it during peak time.
For Percona Server for MySQL, these options ensure it logs all queries to the query log with no sampling:
log_output=file
slow_query_log=ON
long_query_time=0
log_slow_rate_limit=1
log_slow_verbosity=full
log_slow_admin_statements=ON
log_slow_slave_statements=ON
slow_query_log_always_write_time=1
Now that we have full queries, we can easily use Linux command line tools like grep and others to look into what is going on. However, many times this isn’t always convenient. This is where loading logs into storage that you can conveniently query is a good solution.
Let’s look into some specific and interesting cases.
Were any queries killed?
SELECT
   _time,
   query,
   query_time
FROM mysql_slow_log
WHERE killed > 0
┌───────────────_time─┬─query───────────────────────────────┬─query_time─┐
│ 2018-04-02 19:02:56 │ select benchmark(10000000000,”1+1″) │  10.640794 │
└─────────────────────┴─────────────────────────────────────┴────────────┘
1 rows in set. Elapsed: 0.242 sec. Processed 929.14 million rows, 1.86 GB (3.84                                         billion rows/s., 7.67 GB/s.)
Yes. A query got killed after running for 10 seconds.
Did any query fail? With what error codes?
SELECT
   error_num,
   min(_time),
   max(_time),
   count(*)
FROM mysql_slow_log
GROUP BY error_num
┌─error_num─┬──────────min(_time)─┬──────────max(_time)─┬───count()─┐
│         0 │ 2018-04-02 18:59:49 │ 2018-04-07 19:39:27 │ 925428375 │
│      1160 │ 2018-04-02 19:02:56 │ 2018-04-02 19:02:56 │         1 │
│      1213 │ 2018-04-02 19:00:00 │ 2018-04-07 19:18:14 │   3709520 │
│      1054 │ 2018-04-07 19:38:14 │ 2018-04-07 19:38:14 │         1 │
└───────────┴─────────────────────┴─────────────────────┴───────────┘
4 rows in set. Elapsed: 2.391 sec. Processed 929.14 million rows, 7.43 GB (388.64 million rows/s., 3.11 GB/s.)
You can resolve error codes with the perror command:
root@rocky:~# perror 1054
MySQL error code 1054 (ER_BAD_FIELD_ERROR): Unknown column ‘%-.192s’ in ‘%-.192s’
This command has many uses. You can use it to hunt down application issues (like in this example of a missing column — likely due to bad or old code). It can also help you to spot SQL injection attempts that often cause queries with bad syntax, and troubleshoot deadlocks or foreign key violations.
Are there any nasty, long transactions?
SELECT
   transaction_id,
   max(_time) – min(_time) AS run_time,
   count(*) AS num_queries,
   sum(rows_affected) AS rows_changed
FROM mysql_slow_log
WHERE transaction_id != ”
GROUP BY transaction_id
ORDER BY rows_changed DESC
LIMIT 10
┌─transaction_id─┬─run_time─┬─num_queries─┬─rows_changed─┐
│ 17E070082      │ 0 │      1 │ 9999 │
│ 17934C73C      │ 2 │      6 │ 4 │
│ 178B6D346      │ 0 │      6 │ 4 │
│ 17C909086      │ 2 │      6 │ 4 │
│ 17B45EFAD      │ 5 │      6 │ 4 │
│ 17ABAB840      │ 0 │      6 │ 4 │
│ 17A36AD3F      │ 3 │      6 │ 4 │
│ 178E037A5      │ 1 │      6 │ 4 │
│ 17D1549C9      │ 0 │      6 │ 4 │
│ 1799639F2      │ 1 │      6 │ 4 │
└────────────────┴──────────┴─────────────┴──────────────┘
10 rows in set. Elapsed: 15.574 sec. Processed 930.58 million rows, 18.23 GB (59.75 million rows/s., 1.17 GB/s.)
Finding transactions that modify a lot of rows, like transaction 17E070082 above, can be very helpful to ensure you control MySQL replication slave lag. It is also critical if you’re looking to migrate to MySQL Group Replication or Percona XtraDB Cluster.
What statements were executed in a long transaction?
SELECT
   _time,
   _ms,
   query
FROM mysql_slow_log
WHERE transaction_id = ’17E070082′
ORDER BY
   _time ASC,
   _ms ASC
LIMIT 10
┌───────────────_time─┬────_ms─┬─query─────────────────────────────────┐
│ 2018-04-07 20:08:43 │ 890693 │ update sbtest1 set k=0 where id<10000 │
└─────────────────────┴────────┴───────────────────────────────────────┘
1 rows in set. Elapsed: 2.361 sec. Processed 931.04 million rows, 10.79 GB (394.27 million rows/s., 4.57 GB/s.)
I used transaction 17E070082 from the previous query above (which modified 9999 rows). Note that this schema improves compression by storing the seconds and microseconds parts of the timestamp in different columns.
Were any queries dumping large numbers of rows from the database?
SELECT
   _time,
   query,
   rows_sent,
   bytes_sent
FROM mysql_slow_log
WHERE rows_sent > 10000
┌───────────────_time─┬─query────────────────────────────────────────────┬─rows_sent─┬─bytes_sent─┐
│ 2018-04-07 20:21:08 │ SELECT /*!40001 SQL_NO_CACHE */ * FROM `sbtest1` │  10000000 │ 1976260712 │
└─────────────────────┴──────────────────────────────────────────────────┴───────────┴────────────┘
1 rows in set. Elapsed: 0.294 sec. Processed 932.19 million rows, 3.73 GB (3.18 billion rows/s., 12.71 GB/s.)
Did someone Update a record?
SELECT
   _time,
   query
FROM mysql_slow_log
WHERE (rows_affected > 0) AND (query LIKE ‘%id=3301689%’)
LIMIT 1
┌───────────────_time─┬─query─────────────────────────────────────┐
│ 2018-04-02 19:04:48 │ UPDATE sbtest1 SET k=k+1 WHERE id=3301689 │
└─────────────────────┴───────────────────────────────────────────┘
1 rows in set. Elapsed: 0.046 sec. Processed 2.29 million rows, 161.60 MB (49.57 million rows/s., 3.49 GB/s.)
Note that I’m cheating here by assuming we know an update used a primary key, but it is practically helpful in a lot of cases.
These are just some of the examples of what you can find out by querying raw slow query logs. They contain a ton of information about query execution (especially in Percona Server for MySQL) that allows you to use them both for performance analysis and some security and auditing purposes.
The post Why Analyze Raw MySQL Query Logs? appeared first on Percona Database Performance Blog.

Using Hints to Analyze Queries

In this blog post, we’ll look at using hints to analyze queries.
There are a lot of things that you can do wrong when writing a query, which means that there a lot of things that you can do to make it better. From my personal experience there are two things you should review first:

The table join order
Which index is being used

Why only those two? Because many other alternatives that are more expensive, and at the end query optimization is a cost-effectiveness analysis. This is why we must start with the simplest fixes. We can control this with the hints “straight_join” and “force index”. These allow us to execute the query with the plan that we would like to test.
Join Order
In a query where we use multiple tables or subqueries, we have some particular fields that we are going to use to join the tables. Those fields could be the Primary Key of the table, the first part of a secondary index, neither or both. But before we analyze possible scenarios, table structure or indexes, we need to establish what is the best order for that query to join the tables.
When we talked about join order and the several tables to join, one possible scenario is that a table is using a primary key to join a table, and another field to join to other tables. For instance:
select
table_a.id, table_b.value1, table_c.value1
from
table_a join
table_b on table_a.id = table_b.id join
table_c on table_b.id_c = table_c.id
where
table_a.value1=10;
We get this explain:
+—-+————-+———+——–+—————-+———+———+————————————+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——–+—————-+———+———+————————————+——+————-+
| 1 | SIMPLE | table_a | ref | PRIMARY,value1 | value1 | 5 | const | 1 | Using index |
| 1 | SIMPLE | table_b | eq_ref | PRIMARY | PRIMARY | 4 | bp_query_optimization.table_a.id | 1 | Using where |
| 1 | SIMPLE | table_c | eq_ref | PRIMARY | PRIMARY | 4 | bp_query_optimization.table_b.id_c | 1 | NULL |
+—-+————-+———+——–+—————-+———+———+————————————+——+————-+
It is filtering by value1 on table_a, which joins with table_b with the primary key, and table_c uses the value of id_c which it gets from table_b.
But we can change the table order and use straight_join:
select straight_join
table_a.id, table_b.value1, table_c.value1
from
table_c join
table_b on table_b.id_c = table_c.id join
table_a on table_a.id = table_b.id
where
table_a.value1=10;
The query is semantically the same, but now we get this explain:
+—-+————-+———+——–+—————-+———+———+———————————-+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——–+—————-+———+———+———————————-+——+————-+
| 1 | SIMPLE | table_c | ALL | PRIMARY | NULL | NULL | NULL | 1 | NULL |
| 1 | SIMPLE | table_b | ref | PRIMARY,id_c | id_c | 5 | bp_query_optimization.table_c.id | 1 | NULL |
| 1 | SIMPLE | table_a | eq_ref | PRIMARY,value1 | PRIMARY | 4 | bp_query_optimization.table_b.id | 1 | Using where |
+—-+————-+———+——–+—————-+———+———+———————————-+——+————-+
In this case, we are performing a full table scan over table_c, which then joins with table_b using index over id_c to finally join table_a using the primary key.
Sometimes the optimizer chooses the incorrect join order because of bad statistics. I found myself reviewing the first query with the second explain plan, where the only thing that I did to find the query problem was to add “STRAIGHT_JOIN” to the query.
Taking into account that the optimizer could fail on this task, we found a practical way to force it to do what we want (change the join order).
It is also useful to find out when an index is missing. For example:
SELECT costs.id as cost_id, spac_types.id as spac_type_id
FROM
spac_types INNER JOIN
costs_spac_types ON costs_spac_types.spac_type_id = spac_types.id INNER JOIN
costs ON costs.id = costs_spac_types.cost_id
WHERE spac_types.place_id = 131;
The explain plan shows:
+—-+————-+——————+——–+—————————————————-+—————————————————-+———+———————————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——————+——–+—————————————————-+—————————————————-+———+———————————–+——-+————-+
| 1 | SIMPLE | costs_spac_types | index | index_costs_spac_types_on_cost_id_and_spac_type_id | index_costs_spac_types_on_cost_id_and_spac_type_id | 8 | NULL | 86408 | Using index |
| 1 | SIMPLE | spac_types | eq_ref | PRIMARY,index_spac_types_on_place_id_and_spac_type | PRIMARY | 4 | pms.costs_spac_types.spac_type_id | 1 | Using where |
| 1 | SIMPLE | costs | eq_ref | PRIMARY | PRIMARY | 4 | pms.costs_spac_types.cost_id | 1 | Using index |
+—-+————-+——————+——–+—————————————————-+—————————————————-+———+———————————–+——-+————-+
It is starting with costs_spac_types and then using the clustered index for the next two tables. The explain doesn’t look bad!
However, it was taking longer than this:
SELECT STRAIGHT_JOIN costs.id as cost_id, spac_types.id as spac_type_id
FROM
spac_types INNER JOIN
costs_spac_types ON costs_spac_types.spac_type_id = spac_types.id INNER JOIN
costs ON costs.id = costs_spac_types.cost_id
WHERE spac_types.place_id = 131;
0.17 sec versus 0.09 sec. This is the explain plan:
+—-+————-+——————+——–+—————————————————-+—————————————————-+———+——————————+——-+—————————————————————–+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——————+——–+—————————————————-+—————————————————-+———+——————————+——-+—————————————————————–+
| 1 | SIMPLE | spac_types | ref | PRIMARY,index_spac_types_on_place_id_and_spac_type | index_spac_types_on_place_id_and_spac_type | 4 | const | 13 | Using index |
| 1 | SIMPLE | costs_spac_types | index | index_costs_spac_types_on_cost_id_and_spac_type_id | index_costs_spac_types_on_cost_id_and_spac_type_id | 8 | NULL | 86408 | Using where; Using index; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | costs | eq_ref | PRIMARY | PRIMARY | 4 | pms.costs_spac_types.cost_id | 1 | Using index |
+—-+————-+——————+——–+—————————————————-+—————————————————-+———+——————————+——-+—————————————————————–+
Reviewing the table structure:
CREATE TABLE costs_spac_types (
id int(11) NOT NULL AUTO_INCREMENT,
cost_id int(11) NOT NULL,
spac_type_id int(11) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY index_costs_spac_types_on_cost_id_and_spac_type_id (cost_id,spac_type_id)
) ENGINE=InnoDB AUTO_INCREMENT=172742 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
I saw that the unique index was over cost_id and then spac_type_id. After adding this index:
ALTER TABLE costs_spac_types ADD UNIQUE KEY (spac_type_id,cost_id);
Now, the explain plan without STRIGHT_JOIN is:
+—-+————-+——————+——–+—————————————————————–+——————————————–+———+——————————+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——————+——–+—————————————————————–+——————————————–+———+——————————+——+————-+
| 1 | SIMPLE | spac_types | ref | PRIMARY,index_spac_types_on_place_id_and_spac_type | index_spac_types_on_place_id_and_spac_type | 4 | const | 13 | Using index |
| 1 | SIMPLE | costs_spac_types | ref | index_costs_spac_types_on_cost_id_and_spac_type_id,spac_type_id | spac_type_id | 4 | pms.spac_types.id | 38 | Using index |
| 1 | SIMPLE | costs | eq_ref | PRIMARY | PRIMARY | 4 | pms.costs_spac_types.cost_id | 1 | Using index |
+—-+————-+——————+——–+—————————————————————–+——————————————–+———+——————————+——+————-+
Which is much better, as it is scanning fewer rows and the query time is just 0.01 seconds.
Indexes
The optimizer has the choice of using a clustered index, a secondary index, a partial secondary index or no index at all, which means that it uses the clustered index.
Sometimes the optimizer ignores the use of an index because it thinks reading the rows directly is faster than an index lookup:
mysql> explain select * from table_c where id=1;
+—-+————-+———+——-+—————+———+———+——-+——+——-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——-+—————+———+———+——-+——+——-+
| 1 | SIMPLE | table_c | const | PRIMARY | PRIMARY | 4 | const | 1 | NULL |
+—-+————-+———+——-+—————+———+———+——-+——+——-+
mysql> explain select * from table_c where value1=1;
+—-+————-+———+——+—————+——+———+——+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——+—————+——+———+——+——+————-+
| 1 | SIMPLE | table_c | ALL | NULL | NULL | NULL | NULL | 1 | Using where |
+—-+————-+———+——+—————+——+———+——+——+————-+
In both cases, we are reading directly from the clustered index.
Then, we have secondary indexes that are partially used or/and that are partially useful for the query. This means that we are going to scan the index and then we are going to lookup in the clustered index. YES! TWO STRUCTURES WILL BE USED! We usually don’t realize any of this, but this is like an extra join between the secondary index and the clustered index.
Finally, the covering index, which is simple to identify as “Using index” in the extra column:
mysql> explain select value1 from table_a where value1=1;
+—-+————-+———+——+—————+——–+———+——-+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——+—————+——–+———+——-+——+————-+
| 1 | SIMPLE | table_a | ref | value1 | value1 | 5 | const | 1 | Using index |
+—-+————-+———+——+—————+——–+———+——-+——+————-+

Index Analysis
As I told you before, this is a cost-effectiveness analysis from the point of view of query performance. Most of the time it is faster to use covering indexes than secondary indexes, and finally the clustered index. However, usually covering indexes are more expensive for writes, as you need more fields to cover the query needs. So we are going to use a secondary index that also uses the clustered index. If the amount of rows is not large and it is selecting most of the rows, however, it could be even faster to perform a full table scan. Another thing to take into account is that the amount of indexes affects the write rate.
Let’s do an analysis. This is a common query:
mysql> explain select * from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 64 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY | PRIMARY | 4 | bp_query_optimization.t2.value1 | 1 | NULL |
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
It is using all the fields of each table.
This is more restrictive:
mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 64 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY | PRIMARY | 4 | bp_query_optimization.t2.value1 | 1 | NULL |
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
But it is performing a full table scan over t2, and then is using t2.value1 to lookup on t1 using the clustered index.
Let’s add an index on table_index_analisis_2 over value1:
mysql> alter table table_index_analisis_2 add key (value1);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
The explain shows that it is not being used, not even when we force it:
mysql> explain select * from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
| 1 | SIMPLE | t2 | ALL | value1 | NULL | NULL | NULL | 64 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY | PRIMARY | 4 | bp_query_optimization.t2.value1 | 1 | NULL |
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
mysql> explain select * from table_index_analisis_1 t1, table_index_analisis_2 t2 force key (value1) where t1.id = t2.value1;
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
| 1 | SIMPLE | t2 | ALL | value1 | NULL | NULL | NULL | 64 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY | PRIMARY | 4 | bp_query_optimization.t2.value1 | 1 | NULL |
+—-+————-+——-+——–+—————+———+———+———————————+——+————-+
This is because the optimizer considers performing a full table scan better than using a part of the index.
Now we are going to add an index over value1 and value2:
mysql> alter table table_index_analisis_2 add key (value1,value2);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+—-+————-+——-+——–+—————–+———-+———+———————————+——+————————–+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+—————–+———-+———+———————————+——+————————–+
| 1 | SIMPLE | t2 | index | value1,value1_2 | value1_2 | 10 | NULL | 64 | Using where; Using index |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY | PRIMARY | 4 | bp_query_optimization.t2.value1 | 1 | NULL |
+—-+————-+——-+——–+—————–+———-+———+———————————+——+————————–+
We can see that now it is using the index, and in the extra column says “Using index” — which means that it is not using the clustered index.
Finally, we are going to add an index over table_index_analisis_1, in the best way that it is going to be used for this query:
mysql> alter table table_index_analisis_1 add key (id,value1,value2);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+—-+————-+——-+——–+—————–+———-+———+———————————+——+————————–+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+—————–+———-+———+———————————+——+————————–+
| 1 | SIMPLE | t2 | index | value1,value1_2 | value1_2 | 10 | NULL | 64 | Using where; Using index |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,id | PRIMARY | 4 | bp_query_optimization.t2.value1 | 1 | NULL |
+—-+————-+——-+——–+—————–+———-+———+———————————+——+————————–+
2 rows in set (0.00 sec)
However, it is not selected by the optimizer. That is why we need to force it:
mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1 force index(id), table_index_analisis_2 t2 where t1.id = t2.value1;
+—-+————-+——-+——-+—————–+———-+———+———————————+——+————————–+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——-+—————–+———-+———+———————————+——+————————–+
| 1 | SIMPLE | t2 | index | value1,value1_2 | value1_2 | 10 | NULL | 64 | Using where; Using index |
| 1 | SIMPLE | t1 | ref | id | id | 4 | bp_query_optimization.t2.value1 | 1 | Using index |
+—-+————-+——-+——-+—————–+———-+———+———————————+——+————————–+
2 rows in set (0.00 sec)
Now, we are just using the secondary index in both cases.
Conclusions
There are many more hints to analyze queries we could review, like handler used, table design, etc. However, in my opinion, it is useful to focus on these at the beginning of the analysis.
I will also like to point out that using hints is not a long-term solution! Hints should be used just in the analysis phase.
The post Using Hints to Analyze Queries appeared first on Percona Database Performance Blog.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569