Howto make MySQL point-in-time recovery faster ?

Before explaining how you can improve the speed for performing point-in-time recovery, let’s recall what is Point-In-Time Recovery and how it’s usually performed.

Point-in-Time Recovery, PTR

Point-In-Time recovery is a technique of restoring your data until a certain point (usually until an event that you would like that has never happened).

For example, a user did a mistake and you would like to recover your data up to that mistake to revert its effects like a drop table or a massive delete.

The usual technique consists to restore the last backup and replay the binary logs up to that unfortunate “event”.

So, as you might have already realized, backups and binary logs are required 😉

The main spread technique to replace those binary logs event is to use the `mysqlbinlog` command. However, depending on your workload, this process can be quick or slow, depending on how much data there is to process. Moreover, `mysqlbinlog` parses and dumps binary logs in a single thread, therefore sequentially. Imagine you do a daily backup at midnight and one of your user inconveniently deletes  some records at 23.59… you have almost a full day of binary logs to process to be able to perform the Point-in-Time Recovery.

Boost binary log processing

Instead of using the `mysqlbinlog` utility to process our MySQL events, in 5.6 and above we have the possibility to use the MySQL server to perform this operation.
In fact, we will use the slave SQL_thread… and as some of you might have realized it already… we could then process those binary logs in parallel using multiple worker threads !

Example

We have a single server running and it’s configured to generate binary logs.

Sysbench is running oltp on 8 tables using 8 threads while we will play on another table not touched by sysbench to make the example easier to follow.

mysql> create table myusers (id int auto_increment primary key, name varchar(20));
Query OK, 0 rows affected (0.47 sec)

mysql> insert into myusers values (0,'lefred'),(0,'kennito'),(0,'dim0');
Query OK, 3 rows affected (0.36 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> insert into myusers values (0,'flyer'),(0,'luis'),(0,'nunno');
Query OK, 3 rows affected (0.13 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> select * from myusers;
+----+---------+
| id | name    |
+----+---------+
| 1 | lefred   |
| 2 | kennito  |
| 3 | dim0     |
| 4 | flyer    |
| 5 | luis     |
| 6 | nunno    |
+----+---------+
6 rows in set (0.05 sec)

Time for a backup ! Let’s use MEB:

[root@mysql1 mysql]# /opt/mysql/meb-4.1/bin/mysqlbackup --host=127.0.0.1 \
                  --backup-dir=/tmp/backup --user=root backup-and-apply-log

Backup is done, let’s go back to our table (sysbench is running):

mysql> insert into myusers values (0,'alfranio');
Query OK, 1 row affected (0.33 sec)

mysql> insert into myusers values (0,'vitor');
Query OK, 1 row affected (0.09 sec)

Then, oups…

delete from my users…. without where clause !

mysql> delete from myusers;
Query OK, 8 rows affected (0.23 sec)

and we don’t realize it directly, so we continue…

mysql> insert into myusers values (0,'pedro');
Query OK, 1 row affected (0.19 sec)

mysql> insert into myusers values (0,'thiago');
Query OK, 1 row affected (0.16 sec)

mysql> select * from myusers;
+----+--------+
| id | name   |
+----+--------+
| 9  | pedro  |
| 10 | thiago |
+----+--------+
2 rows in set (0.12 sec)

Ouch ! Let’s find what was the problem…

mysql> show master status;
+-------------------+-----------+--------------+------------------+----------------------------------------------+
| File              | Position  | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                            |
+-------------------+-----------+--------------+------------------+----------------------------------------------+
| mysql1-bin.000002 | 232930764 |              |                  | 7766037d-4d1e-11e7-8a51-08002718d305:1-46525 |
+-------------------+-----------+--------------+------------------+----------------------------------------------+
1 row in set (0.12 sec)

mysql> pager grep -A 1 -B 2 'sbtest.myusers' | grep -B 4 Delete
PAGER set to 'grep -A 1 -B 2 'sbtest.myusers' | grep -B 4 Delete'
mysql> show binlog events in 'mysql1-bin.000002';
--
| mysql1-bin.000002 | 195697832 | Gtid        | 1 | 195697904 | SET @@SESSION.GTID_NEXT= '7766037d-4d1e-11e7-8a51-08002718d305:25076' |
| mysql1-bin.000002 | 195697904 | Query       | 1 | 195697978 | BEGIN                                                                 |
| mysql1-bin.000002 | 195697978 | Table_map   | 1 | 195698041 | table_id: 203 (sbtest.myusers)                                        |
| mysql1-bin.000002 | 195698041 | Delete_rows | 1 | 195698168 | table_id: 203 flags: STMT_END_F                                       |
528101 rows in set (1.97 sec)

OK, we know which GTID we should avoid (`7766037d-4d1e-11e7-8a51-08002718d305:25076`).

Time to stop MySQL, copy our binary logs somewhere (I recommend to also stream binary logs to save a live copy) and restore the backup !

[root@mysql1 ~]# systemctl stop mysqld
[root@mysql1 mysql]# mkdir /tmp/binlogs/
[root@mysql1 mysql]# cp mysql1-bin.* /tmp/binlogs/

We are still in /var/lib/mysql 😉

[root@mysql1 mysql]# rm -rf *
[root@mysql1 mysql]# /opt/mysql/meb-4.1/bin/mysqlbackup --backup-dir=/tmp/backup copy-back
[root@mysql1 mysql]# chown -R mysql. *

It’s time to add some required settings in `my.cnf`:

replicate-same-server-id=1
skip-slave-start

We can now restart MySQL and start the PTR:

[root@mysql1 mysql]# systemctl start mysqld
...
mysql> select * from sbtest.myusers;
+----+---------+
| id | name    |
+----+---------+
|  1 | lefred  |
|  2 | kennito |
|  3 | dim0    |
|  4 | flyer   |
|  5 | luis    |
|  6 | nunno   |
+----+---------+
6 rows in set (0.10 sec)

OK we are back at the backup, so it’s time to perform the PTR:

mysql> SET @@GLOBAL.GTID_PURGED='7766037d-4d1e-11e7-8a51-08002718d305:25076';

It’s time to use our binary logs as relay logs, so the first thing to do is to copy those saved earlier and rename them according:

[root@mysql1 mysql]# for i in $(ls /tmp/binlogs/*.0*) 
do  
  ext=$(echo $i | cut -d'.' -f2); 
 cp $i mysql1-relay-bin.$ext; 
done

Make sure that all the new files are referenced in `mysql1-relay-bin.index`:

[root@mysql1 mysql]# ls ./mysql1-relay-bin.0* >mysql1-relay-bin.index
[root@mysql1 mysql]# chown mysql. *relay*
mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysql1-relay-bin.000001', 
       RELAY_LOG_POS=1, MASTER_HOST='dummy';
Query OK, 0 rows affected (4.98 sec)

Performance

Now to benefit from replication’s internals, we will use parallel appliers.

If you don’t have your workload distributed in multiple databases, since 5.7, it’s better to use a different slave parallel type than the default value before starting the `SQL_THREAD` :

mysql> SET GLOBAL SLAVE_PARALLEL_TYPE='LOGICAL_CLOCK';
mysql> SET GLOBAL SLAVE_PARALLEL_WORKERS=8;

Now you can start the replication using the new relay logs:

mysql> START SLAVE SQL_THREAD;

It’s possible to monitor the parallel applying using the following query in performance_schema:

mysql> select * from performance_schema.replication_applier_status_by_worker\G

We can now check that we could rebuild our complete table just ignoring the bad transaction that was a mistake:

mysql> select * from sbtest.myusers;
+----+----------+
| id | name     |
+----+----------+
|  1 | lefred   |
|  2 | kennito  |
|  3 | dim0     |
|  4 | flyer    |
|  5 | luis     |
|  6 | nunno    |
|  7 | alfranio |
|  8 | vitor    |
|  9 | pedro    |
| 10 | thiago   |
+----+----------+

If for any reason you only want to recover until the wrong transaction and nothing after, it’s also possible, this is how to proceed after the backup’s restore (as it’s the same until then).

We need to start mysqld and copy again the binary logs as relay logs. But this time, no need to set any value to GTID_PURGED.

We setup replication as above but this time we start it differently using the keyword UNTIL:

mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysql1-relay-bin.000001', 
       RELAY_LOG_POS=1, MASTER_HOST='dummy';
mysql> set global slave_parallel_type='LOGICAL_CLOCK';
mysql> SET GLOBAL SLAVE_PARALLEL_WORKERS=8;
mysql> START SLAVE SQL_THREAD UNTIL 
       SQL_BEFORE_GTIDS = '7766037d-4d1e-11e7-8a51-08002718d305:25076';

This time, we will replicate until that GTID and then stop the SQL_THREAD.

In both cases, don’t forget after having performed the PTR, to reset all slave information:

mysq> RESET SLAVE ALL;

Conclusion

Of course this is not (yet?) the standard way of doing PTR. Usually, people use mysqlbinlog and replay it through a MySQL client. But this is a nice hack that in some cases may save a lot of time.

关注dbDao.com的新浪微博

扫码加入微信Oracle小密圈,了解Oracle最新技术下载分享资源

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569