Month: October 2017

MySQL Dashboard Improvements in Percona Monitoring and Management 1.4.0

In this blog post, I’ll walk through some of the improvements to the Percona Monitoring and Management (PMM) MySQL dashboard in release 1.4.0.
As the part of Percona Monitoring and Management development, we’re constantly looking for better ways to visualize information and help you to spot and resolve problems faster. We’ve made some updates to the MySQL dashboard in the 1.4.0 release. You can see those improvements in action in our Percona Monitoring and Management Demo Site: check out the MySQL Overview and MySQL InnoDB Metrics dashboards.
MySQL Client Thread Activity

One of the best ways to characterize a MySQL workload is to look at the number of MySQL server-client connections (Threads Connected). You should compare this number to how many of those threads are actually doing something on the server side (Threads Running), rather than just sitting idle waiting for a client to send the next request.
MySQL can handle thousands of connected threads quite well. However, many threads (hundred) running concurrently often increases query latency. Increased internal contention can make the situation much worse.
The problem with those metrics is that they are extremely volatile – one second you might have a lot of threads connected and running, and then none. This is especially true when some stalls on the MySQL level (or higher) causes pile-ups.
To provide better insight, we now show Peak Threads Connected and Peak Threads Running to help easily spot such potential pile-ups, as well as Avg Threads Running. These stats allow you look at a high number of threads connected and running to see if it there are just minor spikes (which tend to happen in many systems on a regular basis), or something more prolonged that warrants deeper investigation.
To simplify it even further: Threads Running spiking for a few seconds is OK, but spikes persisting for 5-10 seconds or more are often signs of problems that are impacting users (or problems about to happen).
InnoDB Logging Performance

Since I wrote a blog post about Choosing MySQL InnoDB Log File Size, I thought it would be great to check out how long the log file space would last (instead of just looking at how much log space is written per hour). Knowing how long the innodb_log_buffer_size lasts is also helpful for tuning this variable, in general.
This graph shows you how much data is written to the InnoDB Log Files, which helps to understand your disk bandwidth consumption. It also tells you how long it will take to go through your combined Redo Log Space and InnoDB Log Buffer Size (at this rate).
As I wrote in the blog post, there are a lot of considerations for choosing the InnoDB log file size, but having enough log space to accommodate all the changes for an hour is a good rule of thumb. As we can see, this system is close to full at around 50 minutes.
When it comes to innodb_log_buffer_size, even if InnoDB is not configured to flush the log at every transaction commit, it is going to be flushed every second by default. This means 10-15 seconds is usually good enough to accommodate the spikes. This system has it set at about 40 seconds (which is more than enough).
InnoDB Read-Ahead

This graph helps you understand how InnoDB Read-Ahead is working out, and is a pretty advanced graph.
In general, Innodb Read-Ahead is not very well understood. I think in most cases it is hard to tell if it is helping or hurting the current workload in its current configuration.
The for Read-Ahead in any system (not just InnoDB) is to pre-fetch data before it is really needed (in order to reduce latency and improve performance). The risk, however, is pre-fetching data that isn’t needed. This is wasteful.
InnoDB has two Read-Ahead options: Linear Read-Ahead (designed to speed up workloads that have physically sequential data access) and Random Read-Ahead (designed to help workloads that tend to access the data in the same vicinity but not in a linear order).
Due to potential overhead, only Linear Read-Ahead is enabled by default. You need to enable Random Read-Ahead separately if you want to determine its impact on your workload
Back to the graph in question: we show a number of pages pre-fetched by Linear and Random Read-Aheads to confirm if these are even in use with your workload. We show Number of Pages Fetched but Never Accessed (evicted without access) – shown as both the number of pages and as a percent of pages. If Fetched but Never Accessed is more than 30% or so, Read-Ahead might be producing more waste instead of helping your workload. It might need tuning.
We also show the portion of IO requests that InnoDB Read-Ahead served, which can help you understand the portion of resources spent on InnoDB Read-Ahead
Due to the timing of how InnoDB increments counters, the percentages of IO used for Read-Ahead and pages evicted without access shows up better on larger scale graphs.
Conclusion
I hope you find these graphs helpful. We’ll continue making Percona Monitoring and Management more helpful for troubleshooting database systems and getting better performance!

MySQL 5.7 InnoDB Temporary Tablespace – but why?

So, recently we had a runaway query eat up all sorts of temporary table space on our machines. Several machines had several terabytes in their ibtmp1 file after this happened. So I set out to find out more about why the InnoDB temporary tablespace is used, why it is better than using regular files, which was what was used prior to MySQL 5.7, and how to make sure that runaway queries do not end up filling up disk space.
Unfortunately, the manual does not go into why ibtmp1 is better than one file per temporary query, which disappears once the query ends. There are a few sections to look into:
Temporary Table Undo Logs – has one paragraph that states that these are the undo logs for temporary tablespaces. Given that these are undo logs, my guess is that this makes MySQL more crash-safe. But that is just a guess.
There is also InnoDB Temporary Tablespace which is two paragraphs, with some more details, but again, no reasoning why.
And finally, the documentation for the innodb_temp_data_file_path system variable sheds a bit of light on the subject – It explains “Metadata about active InnoDB temporary tables is located in INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO.”
There is a manual page on Innodb temp table info table as well, which shows both compressed and uncompressed tables – uncompressed tables live in the ibtmp tablespaces, and compressed temporary tables live in the .ibd tablespace of the compressed table in question – as shown in the “PER_TABLE_SPACE” and “IS_COMPRESSED” fields.
Sadly, the table does not give useful information such as which process ID or user is generating the data. And of course it is only the active temporary space usage at the time – if you have a large temporary tablespace but no active queries using the tablespace, INNODB_TEMP_TABLE_INFO is empty.
I can imagine a scenario with more than one long-running query using a lot of space in the temporary tablespace. But I do not see how the INNODB_TEMP_TABLE_INFO would help me determine any useful information as to which query I should kill. I guess it is useful to see if there is an active query currently using temporary tablespace, but when you have a large file with nothing in it, it is just that much more depressing.
# mysql INFORMATION_SCHEMA -p -e “SELECT * FROM INNODB_TEMP_TABLE_INFO”
Enter password:
# ls -rlth /var/lib/mysql/ibtmp1
-rw-r—– 1 mysql mysql 2.3T Oct 31 10:50 /var/lib/mysql/ibtmp1

How to Stop or Throttle SST Operation on a Galera Cluster

State Snapshot Transfer (SST) is one of the two ways used by Galera to perform initial syncing when a node is joining a cluster, until the node is declared as synced and part of the “primary component”. Depending on the dataset size and workload, SST could be lightning fast, or an expensive operation which will bring your database service down on its knees.

SST can be performed using 3 different methods:

mysqldump
rsync (or rsync_wan)
xtrabackup (or xtrabackup-v2, mariabackup)

Most of the time, xtrabackup-v2 and mariabackup are the preferred options. We rarely see people running on rsync or mysqldump in production clusters.

The Problem

When SST is initiated, there are several processes triggered on the joiner node, which are executed by the “mysql” user:

$ ps -fu mysql
UID PID PPID C STIME TTY TIME CMD
mysql 117814 129515 0 13:06 ? 00:00:00 /bin/bash -ue /usr//bin/wsrep_sst_xtrabackup-v2 –role donor –address 192.168.55.173:4444/xtrabackup_sst//1 –socket /var/lib/mysql/mysql.sock –datadir
mysql 120036 117814 15 13:06 ? 00:00:06 innobackupex –no-version-check –tmpdir=/tmp/tmp.pMmzIlZJwa –user=backupuser –password=x xxxxxxxxxxxxxx –socket=/var/lib/mysql/mysql.sock –galera-inf
mysql 120037 117814 19 13:06 ? 00:00:07 socat -u stdio TCP:192.168.55.173:4444
mysql 129515 1 1 Oct27 ? 01:11:46 /usr/sbin/mysqld –wsrep_start_position=7ce0e31f-aa46-11e7-abda-56d6a5318485:4949331

While on the donor node:

mysql 43733 1 14 Oct16 ? 03:28:47 /usr/sbin/mysqld –wsrep-new-cluster –wsrep_start_position=7ce0e31f-aa46-11e7-abda-56d6a5318485:272891
mysql 87092 43733 0 14:53 ? 00:00:00 /bin/bash -ue /usr//bin/wsrep_sst_xtrabackup-v2 –role donor –address 192.168.55.172:4444/xtrabackup_sst//1 –socket /var/lib/mysql/mysql.sock –datadir /var/lib/mysql/ –gtid 7ce0e31f-aa46-11e7-abda-56d6a5318485:2883115 –gtid-domain-id 0
mysql 88826 87092 30 14:53 ? 00:00:05 innobackupex –no-version-check –tmpdir=/tmp/tmp.LDdWzbHkkW –user=backupuser –password=x xxxxxxxxxxxxxx –socket=/var/lib/mysql/mysql.sock –galera-info –stream=xbstream /tmp/tmp.oXDumYf392
mysql 88827 87092 30 14:53 ? 00:00:05 socat -u stdio TCP:192.168.55.172:4444

SST against a large dataset (hundreds of GBytes) is no fun. Depending on the hardware, network and workload, it may take hours to complete. Server resources may be saturated during the operation. Despite throttling is supported in SST (only for xtrabackup and mariabackup) using –rlimit and –use-memory options, we are still exposed to a degraded cluster when you are running out of majority active nodes. For example, if you are unlucky enough to find yourself with only one out of three nodes running. Therefore, you are advised to perform SST during quiet hours. You can, however, avoid SST by taking some manual steps, as described in this blog post.

Stopping an SST

Stopping an SST needs to be done on both the donor and the joiner nodes. The joiner triggers SST after determining how big the gap is when comparing the local Galera seqno with cluster’s seqno. It executes the wsrep_sst_{wsrep_sst_method} command. This will be picked by the chosen donor, which will start streaming out data to the joiner. A donor node has no capabilities of refusing to serve snapshot transfer, once selected by Galera group communication, or by the value defined in wsrep_sst_donor variable. Once the syncing has started and you want to revert the decision, there is no single command to stop the operation.

The basic principle when stopping an SST is to:

Make the joiner look dead from a Galera group communication point-of-view (shutdown, fence, block, reset, unplug cable, blacklist, etc)
Kill the SST processes on the donor

One would think that killing the innobackupex process (kill -9 {innobackupex PID}) on the donor would be enough, but that is not the case. If you kill the SST processes on donor (or joiner) without fencing off the joiner, Galera still can see the joiner as active and will mark the SST process as incomplete, thus respawning a new set of processes to continue or start over again. You will be back to square one. This is the expected behaviour of /usr/bin/wsrep_sst_{method} script to safeguard SST operation which is vulnerable to timeouts (e.g., if it is long-running and resource intensive).

Let’s look at an example. We have a crashed joiner node that we would like to rejoin the cluster. We would start by running the following command on the joiner:

$ systemctl start mysql # or service mysql start

A minute later, we found out that the operation is too heavy at that particular moment, and decided to postpone it later during low traffic hours. The most straightforward way to stop an xtrabackup-based SST method is by simply shutting down the joiner node, and kill the SST-related processes on the donor node. Alternatively, you can also block the incoming ports on the joiner by running the following iptables command on the joiner:

$ iptables -A INPUT -p tcp –dport 4444 -j DROP
$ iptables -A INPUT -p tcp –dport 4567:4568 -j DROP

Then on the donor, retrieve the PID of SST processes (list out the processes owned by “mysql” user):

$ ps -u mysql
PID TTY TIME CMD
117814 ? 00:00:00 wsrep_sst_xtrab
120036 ? 00:00:06 innobackupex
120037 ? 00:00:07 socat
129515 ? 01:11:47 mysqld

Finally, kill them all except the mysqld process (you must be extremely careful to NOT kill the mysqld process on the donor!):

$ kill -9 117814 120036 120037

Then, on the donor MySQL error log, you should notice the following line appearing after ~100 seconds:

2017-10-30 13:24:08 139722424837888 [Warning] WSREP: Could not find peer: 42b85e82-bd32-11e7-87ae-eff2b8dd2ea0
2017-10-30 13:24:08 139722424837888 [Warning] WSREP: 1.0 (192.168.55.172): State transfer to -1.-1 (left the group) failed: -32 (Broken pipe)

At this point, the donor should return to the “synced” state as reported by wsrep_local_state_comment and the SST process is completely stopped. The donor is back to its operational state and is able to serve clients in full capacity.

For the cleanup process on the joiner, you can simply flush the iptables chain:

$ iptables -F

Or simply remove the rules with -D flag:

$ iptables -D INPUT -p tcp –dport 4444 -j DROP
$ iptables -D INPUT -p tcp –dport 4567:4568 -j DROP

Related resources

 How to Avoid SST when adding a new node to Galera Cluster for MySQL or MariaDB

 Using the Galera Replication Window advisor to avoid SST

 Updated – How to Bootstrap MySQL or MariaDB Galera Cluster

The similar approach can be used with other SST methods like rsync, mariabackup and mysqldump.

Throttling an SST (xtrabackup method only)

Depending on how busy the donor is, it’s a good approach to throttle the SST process so it won’t impact the donor significantly. We’ve seen a number of cases where, during catastrophic failures, users were desperate to bring back a failed cluster as a single bootstrapped node, and let the rest of the members catch up later. This attempt reduces the downtime from the application side, however, it creates additional burden on this “one-node cluster”, while the remaining members are still down or recovering.

Xtrabackup can be throttled with –throttle=<rate of IO/sec> to simply limit the number of IO operation if you are afraid that it will saturate your disks, but this option is only applicable when running xtrabackup as a backup process, not as an SST operator. Similar options are available with rlimit (rate limit) and can be combined with –use-memory to limit the RAM usage. By setting up values under [sst] directive inside the MySQL configuration file, we can ensure that the SST operation won’t put too much load on the donor, even though it can take longer to complete. On the donor node, set the following:

[sst]
rlimit=128k
inno-apply-opts=”–use-memory=200M”

More details on the Percona Xtrabackup SST documentation page.

However, there is a catch. The process could be so slow that it will never catch up with the transaction logs that InnoDB is writing, so SST might never complete. Generally, this situation is very uncommon, unless if you really have a very write-intensive workload or you allocate very limited resources to SST.

Conclusions

SST is critical but heavy, and could potentially be a long-running operation depending on the dataset size and network throughput between the nodes. Regardless of the consequences, there are still possibilities to stop the operation so we can have a better recovery plan at a better time.

Tags: 

galera
sst
MySQL
MariaDB
troubleshooting

MariaDB Server 10.0.33 now available

MariaDB Server 10.0.33 now available
dbart

Mon, 10/30/2017 – 14:12

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.0.33. See the release notes and changelog for details and visit mariadb.com/dow…

Monyog MySQL Monitor v8.3.0: Introducing Trend Graph Analysis

Monyog MySQL Monitor v8.3.0 is a feature-rich release which adds a large number of user requests for quick access to relevant monitoring information, for ‘cross-plotting’ multiple servers in a unified chart and more.  Additionally, it adds a number of non-critical bug fixes.
Changes as compared to Monyog MySQL Monitor v8.2.0 include:
Features:

Added option to set a distinct email distribution list for warning alerts and critical alerts.
Trend Graph Analysis: Added option to group a single metric (which one you find most important) from different servers into one unified chart. This allows you to visually analyse a metric across servers at various points in time.

Choose the monitor group >> Click on the trend graph icon next to the metric as shown below.

Select the trend graph corresponding to the required metric

Monitor single metric across servers

Added option for disk monitoring of the system where Monyog is installed. In case the free space on the system where Monyog is installed goes below a defined threshold value, Monyog will raise an alert.
Added a REGEX-based filter to exclude unwanted long_running queries in Sniffer. Monyog will ignore queries satisfying the expression and neither kill nor send an alert for such.

REGEX-based filter to exclude unwanted long_running queries

Added option to filter queries based on poor indexes, missing indexes, errors and warnings. This feature is available for PERFORMANCE SCHEMA based sniffer in Query Analyzer and “Show details” page in Dashboard. When you choose a filter, Monyog will show queries based on the criteria set.

Filter queries based on poor indexes, missing indexes, errors and warnings

Redesigned Server selector GUI.
Redesigned Query Analyzer GUI:  a more intuitive view of the top 5 queries based on total time has been added.

A more intuitive view of the top 5 queries based on total time has been added

Added Y-axis unit for charts, along with the option to define the units and unit-factors.

Bug Fixes:

Fixed bogus “UNIQUE constraint failed..” -errors in MONyog.log.
In rare cases real-time page hanged on switching between real-time sessions.
In Option to plot values based on data collection in History/Trend analysis(For uptime counters) only “Group By” -option was available for those counters.
Monyog failed to recognise Aurora instances for OS monitoring. The four required fields for OS monitoring were not displaying.
Email addresses with “+” in monitor level notification settings was not accepted.
Added “DB” as a column under ‘Manage Columns’ in Dashboard’s ‘show details’ page in ‘Processlist’ mode.
Monyog showed the total time and average latency as “0” when the execution time of the query was less than a second for table based slow query log.
Fixed an issue with Custom SQL Object (CSO) – Monyog logged entries like “not an error” in MONyog.log.
Server name with ‘&’ in it was not getting deleted.
In some rare cases, Monyog failed to generate alert if the network connection on the machine where MySQL is running goes down.
Some monitors in the linux monitor group gave NaN while doing History/Trend Analysis.

You can download a 14-day free trial of Monyog here.
Purchase Monyog: https://www.webyog.com/product/monyogpricing
The post Monyog MySQL Monitor v8.3.0: Introducing Trend Graph Analysis appeared first on Monyog Blog.

Percona XtraDB Cluster 5.7.19-29.22-3 is now available

Percona XtraDB Cluster 5.7Percona announces the release of Percona XtraDB Cluster 5.7.19-29.22-3 on October 27, 2017. Binaries are available from the downloads section or our software repositories. NOTE: You can also run Docker containers from the images in the Docker Hub repository. Percona XtraDB Cluster 5.7.19-29.22-3 is now the current release, based on the following: Percona Server 5.7.19-17 Galera […]

Percona XtraDB Cluster 5.6.37-26.21-3 is Now Available

Percona XtraDB Cluster 5.7Percona announces the release of Percona XtraDB Cluster 5.6.37-26.21-3 on October 27, 2017. Binaries are available from the downloads section or our software repositories. Percona XtraDB Cluster 5.6.37-26.21-3 is now the current release, based on the following: Percona Server for MySQL 5.6.37-82.2 Galera Replication library 3.21 wsrep API version 26 All Percona software is open-source and […]

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569