Partial (Optimised) JSON updates and Replication in MySQL 8.0

           MySQL is the most widely used  Relational Database Management System in the open source world. MySQL stepped into the NoSQL world by introducing the JSON Data Type in MySQL 5.7 release. In this blog post I am going to explain one of the major advantage of optimisation made in JSON Replication .

This was done from the MySQL 8.0.3 release.

What happened before 8.0.3 ?

Before MySQL 8.0.3, Whatever changes (Updates) made in JSON document, it will log as a full document in binary log & replicate the same into slave. The JSON data is stored as a blob internally. Below is an example of how it is logged as full document in binary log ?

Example –

Server version - 5.7.22-log MySQL Community Server (GPL)

My Binlog settings,

log-bin
Binlog_format = ROW
Binlog_row_image = minimal

Creating table with JSON data type,

mysql> create table json_repl (id int primary key, emp_de json);
Query OK, 0 rows affected (0.01 sec)   

Inserting the values,

mysql> insert into json_repl values(1,'{"tag": "desk1 addr1 history1 grade1", "book": "table1 and emp_historicaldata", "emp_id": "aaaa1", "designation": "MySQL DBA", "position_grade": "A1 grade1 and MASTER in MySQL"}');
Query OK, 1 row affected (0.01 sec)

Updating the record with JSON function JSON_REPLACE.

mysql> update json_repl set emp_de = json_replace(emp_de,'$.emp_id','a1');
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1 Warnings: 0

Binlog view ,

mysqlbinlog -vvv --base64-output=DECODE-ROWS mydbopslabs13-bin.000011 > log57

cat log57

### UPDATE `json_store`.`json_repl`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
### SET
###   @2='\x00\x05\x00<B2>\x00\x27\x00\x03\x00*\x00\x04\x00.\x00\x06\x004\x00\x0b\x00?\x00\x0e\x00\x0cM\x00\x0ci\x00\x0c<87>\x00\x0c<8A>\x00\x0c<94>\x00tagbookemp_iddesignationposition_grade\x1bdesk1 addr1 history1 grade1\x1dtable1 and emp_historicaldata\x02a1\x09MySQL DBA\x1dA1 grade1 and MASTER in MySQL' /* JSON meta=4 nullable=1 is_null=0 */
# at 666596
#180421 23:00:58 server id 1  end_log_pos 666627 CRC32 0xbf0a14ea       Xid = 138659
COMMIT/*!*/;

 

Here, the JSON_REPLACE function is used to update the value in emp_id field . But, it is logging the full document in binary log file .

What happened after 8.0.3 ?

From MySQL 8.0.3, MySQL community released the new variable to avoid the full document logging into the binary log . The new variable  binlog_row_value_options.  The valid options are PARTIAL_JSON or empty(‘’) . More information in the worklog (WL 2955)

Below, I have explained how the replication is differed from older versions .

Example –

MySQL Version - MySQL 8.0.3

My Binlog settings ( my.cnf )

log-bin
binlog_format=row
binlog_row_image=minimal
binlog_row_value_options=partial_json

Creating table with JSON data type,

mysql> create table json_repl (id int primary key, emp_de json);
Query OK, 0 rows affected (0.01 sec)   

Inserting the values,

mysql> insert into json_repl values(1,'{"tag": "desk1 addr1 history1 grade1", "book": "table1 and emp_historicaldata", "emp_id": "aaaa1", "designation": "MySQL DBA", "position_grade": "A1 grade1 and MASTER in MySQL"}');
Query OK, 1 row affected (0.01 sec)

Updating the record with JSON function JSON_REPLACE.

mysql> update json_repl set emp_de = json_replace(emp_de,'$.emp_id','a1');
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1 Warnings: 0

Binlog view ,

mysqlbinlog -vvv --base64-output=DECODE-ROWS mydbopslabs15-bin.000008 > log803

cat log803

### UPDATE `json_store`.`json_repl`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
### SET
###   @2=JSON_REPLACE(@2, '$.emp_id', 'a1') /* JSON meta=4 nullable=1 is_null=0 */
# at 516165
#180421 23:02:27 server id 3  end_log_pos 516196 CRC32 0x43c90650       Xid = 127163
COMMIT/*!*/;

 

Now, we are can observe only the updated field in binlog . Hope, it clearly explains the optimisation in JSON .

How it Benefits in Production Environment ?

Of course, there are many benefits. One of the major advantage is less disk usage .

Lower disk usage (IO)  –

By avoiding to logging the full document, we can save the disk usage (IO). I am just curious to know how much disk we can save by this approach, so i have made an test with 1 core records in both MySQL 5.7 and MySQL 8.0.3 . Below i am sharing the findings and my test results as well .

I have initiated the below script for insert the record in both MySQL 5.7 & MySQL 8.0.3.

For INSERT,

[sakthi@mydbopslabs13 mysql]# cat json_insert.sh
#bin/bash

for i in `seq 2 10000000`;
do
mysql -e "insert into json_store.json_repl values($i,'{\"emp_id\": \"aaaa$i\", \"tag\": \"desk$i addr$i history$i grade$i\", \"book\": \"table$i and emp_historicaldata\", \"position_grade\": \"A$i grade$i and MASTER in MySQL\", \"designation\": \"MySQL DBA\"}');"
done

 

After inserting the records, i have decided to update every values based on primary key with the below script in both MySQL 5.7 and MySQL 8.0.3. Then we can finally analyze the binlog growth . 

[sakthi@mydbopslabs13 mysql]# cat json_update.sh

#/bin/bash
mysql -e "flush logs;"
for i in `seq 1 10000000`;
do
mysql -e "update json_store.json_repl set emp_de = json_replace(emp_de,'$.emp_id','a$i') where id=$i;"
done

 

I have analysed the binlog growth in both MySQL 5.7 & MySQL 8.0.3 post JOSN Update. It is really good .

Binlog Growth for update in MySQL 5.7,

rw-r-----. 1 mysql mysql 1.1G  Apr 21 20:05 mydbopslabs13-bin.0000017
rw-r-----. 1 mysql mysql 1.1G  Apr 21 20:34 mydbopslabs13-bin.0000018
rw-r-----. 1 mysql mysql 1.1G  Apr 21 21:02 mydbopslabs13-bin.0000019
rw-r-----. 1 mysql mysql 1.1G  Apr 21 21:29 mydbopslabs13-bin.0000020
rw-r-----. 1 mysql mysql 1.1G  Apr 21 21:57 mydbopslabs13-bin.0000021
rw-r-----. 1 mysql mysql 1.1G  Apr 21 22:31 mydbopslabs13-bin.0000022
rw-r-----. 1 mysql mysql 652M  Apr 21 23:00 mydbopslabs13-bin.0000023

Total – 6 GB 652 MB

Binlog Growth for update in MySQL 8.0,

-rw-r-----. 1 mysql mysql 1.1G  Apr 21 20:07 mydbopslabs15-bin.000012
-rw-r-----. 1 mysql mysql 1.1G  Apr 21 20:38 mydbopslabs15-bin.000013
-rw-r-----. 1 mysql mysql 1.1G  Apr 21 21:09 mydbopslabs15-bin.000014
-rw-r-----. 1 mysql mysql 1.1G  Apr 21 21:35 mydbopslabs15-bin.000015
-rw-r-----. 1 mysql mysql 1.1G  Apr 21 22:11 mydbopslabs15-bin.000016
-rw-r-----. 1 mysql mysql 74M   Apr 21 22:17 mydbopslabs15-bin.000017

Total – 5 GB 74 MB

Binlog Growth Graph –

20180422012106_hdFrom the analysis we have saved approximately 1.5 GB of disk space. It will be vary based on yours JSON data .

Less Network Traffic –

We can transfer the small amount of data over network instead of the full document . By this approach We can save the unwanted network usage in replication too.

Can avoid Replication Lag –

Replicating the big document (blob) might cause slower writes to the slave. So, there is a chance for replication lag.  We can avoid the replication  lag by this approach .

Limitations –

This only take effect with the below JSON functions. Without the below functions, it will log the full document even the partial_json enabled .

JSON_SET() 
JSON_REPLACE()
JSON_REMOVE()

Input column and target column should be the same .

It only affect the ROW based replication Environment . Statement based replication will always works as its nature .

Conclusion –

JSON is very good for store the NoSQL data . It is very good for read purpose . MySQL has introduced the JSON and keep improving it . Definitely, JSON will play an important role for making the MySQL better . At Mydbops, We are keep testing the new things on MySQL and related tools, will be coming back with new blog soon.

Image Courtesy : https://unsplash.com/photos/Ifk3WssHNRw

关注dbDao.com的新浪微博

扫码加入微信Oracle小密圈,了解Oracle最新技术下载分享资源

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569