Month: May 2017

MariaDB 10.1.24 and Connector/C 2.3.3 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.24, and MariaDB Connector/C 2.3.3. See the release notes and changelogs for details. Download MariaDB 10.1.24 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Connector/C 2.3.3 Release Notes Changelog About MariaDB Connector/C Thanks, and enjoy […]
The post MariaDB 10.1.24 and Connector/C 2.3.3 now available appeared first on MariaDB.org.

SQLskills SQL101: How can a SELECT cause a database to change?

As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 […]

The post SQLskills SQL101: How can a SELECT cause a database to change? appeared first on Paul S. Randal.

ProxySQL-Assisted Percona XtraDB Cluster Maintenance Mode

In this blog post, we’ll look at how Percona XtraDB Cluster maintenance mode uses ProxySQL to take cluster nodes offline without impacting workloads.
Percona XtraDB Cluster Maintenance Mode
Since Percona XtraDB Cluster offers a high availability solution, it must consider a data flow where a cluster node gets taken down for maintenance (through isolation from a cluster or complete shutdown).
Percona XtraDB Cluster facilitated this by introducing a maintenance mode. Percona XtraDB Cluster maintenance mode reduces the number of abrupt workload failures if a node is taken down using ProxySQL (as a load balancer).
The central idea is delaying the core node action and allowing ProxySQL to divert the workload.
How ProxySQL Manages Percona XtraDB Cluster Maintenance Mode
With Percona XtraDB Cluster maintenance mode, ProxySQL marks the node as OFFLINE when a user triggers a shutdown signal (or wants to put a specific node into maintenance mode):

When a user triggers a shutdown, Percona XtraDB Cluster node sets pxc_maint_mode to SHUTDOWN (from the DISABLED default) and sleep for x seconds (dictated by pxc_maint_transition_period  — 10 secs by default). ProxySQLauto detects this change and marks the node as OFFLINE. With this change, ProxySQL avoids opening new connections for any DML transactions, but continues to service existing queries until pxc_maint_transition_period. Once the sleep period is complete, Percona XtraDB Cluster delivers a real shutdown signal — thereby giving ProxySQL enough time to transition the workload.

If the user needs to take a node into maintenance mode, the user can simply set pxc_maint_mode to MAINTENANCE. With that, pxc_maint_mode is updated and the client connection updating it goes into sleep for x seconds (as dictated by pxc_maint_transition_period) before giving back control to the user. ProxySQL auto-detects this change and marks the node as OFFLINE. With this change ProxySQL avoids opening new connections for any DML transactions but continues to service existing queries.

ProxySQL auto-detects this change in maintenance state and then automatically re-routes traffic, thereby reducing abrupt workload failures.

Technical Details:

The ProxySQL Galera checker script continuously monitors the state of individual nodes by checking the pxc_maint_mode variable status (in addition to the existing wsrep_local_state) using the ProxySQL scheduler feature

Scheduler is a Cron-like implementation integrated inside ProxySQL, with millisecond granularity.

If proxysql_galera_checker detects pxc_maint_mode = SHUTDOWN | MAINTENANCE, then it marks the node as OFFLINE_SOFT.  This avoids the creation of new connections (or workloads) on the node.

Sample proxysql_galera_checker log:
Thu Dec  8 11:21:11 GMT 2016 Enabling config
Thu Dec  8 11:21:17 GMT 2016 Check server 10:127.0.0.1:25000 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:17 GMT 2016 Check server 10:127.0.0.1:25100 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:17 GMT 2016 Check server 10:127.0.0.1:25200 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:17 GMT 2016 Changing server 10:127.0.0.1:25200 to status OFFLINE_SOFT due to SHUTDOWN
Thu Dec  8 11:21:17 GMT 2016 Number of writers online: 2 : hostgroup: 10
Thu Dec  8 11:21:17 GMT 2016 Enabling config
Thu Dec  8 11:21:22 GMT 2016 Check server 10:127.0.0.1:25000 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:22 GMT 2016 Check server 10:127.0.0.1:25100 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:22 GMT 2016 Check server 10:127.0.0.1:25200 , status OFFLINE_SOFT , wsrep_local_state 4
Ping us below with any questions or comments.

Get Your Oracle Certification Questions Answered

If you’ve contacted our Oracle Certification Support team recently, you may have noticed a difference in how you receive support. You now start your help journey in our Guided Assistance Tool.

So many requests that we receive can be resolved with a simple answer to a question, giving the candidate the opportunity to get his question answered immediately.

When you first access the tool, you’ll select a filter category. The most common questions that we receive under that category show on the right. Many times, this is all you need to get your question answered and move on. If you don’t see your specific question listed, select the question closest to yours. This may answer your question.

You’ll have the opportunity to log a ticket to get your question answered if you are unable to get a satisfactory answer in the guided assistance. 

The process to request assistance remains the same, we simply added a helpful step that may resolve your issue without the wait!

Migration from MySQL Master-Slave pair to MySQL InnoDB Cluster: howto

MySQL InnoDB Cluster (or only Group Replication) becomes more and more popular. This solution doesn’t attract only experts anymore. On social medias, forums and other discussions, people are asking me what it the best way to migrate a running environment using traditional asynchronous replication [Master -> Slave(s)] to InnoDB Cluster.
The following procedure is what I’m actually recommending. These steps have for objective to reduce the downtime to the minimum for the database service.
We can divide the procedure in 9 steps:

the current situation
preparing the future cluster
data transfert
replication from current system
creation of the cluster with a single instance
adding instances to the cluster
configure the router
test phase
pointing the application to the new solution

1. the current situation

Our application connects to mysql1 which also acts as master for mysql2. mysql3 and mysql4 are spare servers that will be used for the new MySQL InnoDB Cluster.
The final architecture will be having a MySQL InnoDB Cluster group of 3 machines: mysql2, mysql3 and mysql4.
2. preparing the cluster
The current Master-Slave setup musts use GTID. (so at least using 5.6)
MySQL >= 5.7.17 must be used for the InnoDB Cluster group members.
Read the Group Replication’s requirements & limitations:

official manual
MySQL Group Replication and Table Design
Savepoints and GR: here and here

So, on mysql3 and mysql4, we only need to install MySQL >=5.7.17.
There are also two different approaches to create such cluster:

create it manually and then use MySQL Shell to create the metadata needed by MySQL-router
let’s do everything using MySQL Shell

We will of course use the second option.
3. data transfer
As now, the provisioning of a new member is like any other type of MySQL Replication when you need to provision a new slave, a manual operation. Use a backup !
Group Replication is “just” another type of MySQL Replication, therefor we need to use the same concept. Of course we understand that everybody would benefit from an automatic provisioning process, but we don’t have such solution.
The backup must be consistent and provide the GTID of the last transaction being part of the backup.
You can use any option you want, logical backup with mysqldump, physical backup with MEB or Xtrabackup, etc..
I will use MEB to illustrate the different operations.
backup:
Let’s take a backup on mysql1:
[mysql1 ~]# mysqlbackup –host=127.0.0.1 –backup-dir=/tmp/backup \
–user=root –password=X backup-and-apply-log
Of course we could have taken the backup from mysql2 too.
transfer:
We need to copy the backup from mysql1 to mysql3:
[mysql1 ~]# scp -r /tmp/backup mysql3:/tmp

restore:
Be sure that mysqld is not running on mysql3:
[mysql3 ~]# systemctl stop mysqld
Now, it’s time to restore the backup on mysql3, this consists to a simple copy-back and ownership change:
[mysql3 ~]# mysqlbackup –backup-dir=/tmp/backup –force copy-back
[mysql3 ~]# rm /var/lib/mysql/mysql*-bin* # just some cleanup
[mysql3 ~]# chown -R mysql. /var/lib/mysql

4. replication from current system
At the end of this section we will then have just a normal asynchronous slave.
We need to verify MySQL’s configuration to be sure that my.cnf is configured properly to act as a slave:
[mysqld]

server_id=3
enforce_gtid_consistency = on
gtid_mode = on
log_bin
log_slave_updates

Let’s start mysqld:
[mysql3 ~]# systemctl start mysqld
Now, we need to find the latest GTIDs purged from the backup and set it. Then we will have to setup asynchronous replication and start it. We will then have live data from production replicated to this new slave.
The location where to find this information will depend of your backup solution.
Using MEB, the latest purged GTIDs are found in the file called backup_gtid_executed.sql :
[mysql3 ~]# cat /tmp/backup/meta/backup_gtid_executed.sql
# On a new slave, issue the following command if GTIDs are enabled:
SET @@GLOBAL.GTID_PURGED=’33351000-3fe8-11e7-80b3-08002718d305:1-1002′;

# Use the following command if you want to use the GTID handshake protocol:
# CHANGE MASTER TO MASTER_AUTO_POSITION=1;
Let’s connect to mysql on mysql3 and setup replication:
mysql> CHANGE MASTER TO MASTER_HOST=”mysql1″,
MASTER_USER=”repl_async”, MASTER_PASSWORD=’Xslave’,
MASTER_AUTO_POSITION=1;
mysql> RESET MASTER;
mysql> SET global gtid_purged=”33351000-3fe8-11e7-80b3-08002718d305:1-1002″;
mysql> START SLAVE;

The credentials used for replication must be present in mysql1.
Using SHOW SLAVE STATUS; you should be able to see the new slave is replicating.

5. creation of the cluster with a single instance
It’s finally time to use MySQL Shell !
You can run THE shell using mysqlsh:
[mysql3 ~]# mysqlsh
Now we can verify if our server is ready to become a member of a new cluster:
mysql-js> dba.checkInstanceConfiguration(‘root@mysql3:3306’)
Please provide the password for ‘root@mysql3:3306’:
Validating instance…

The instance ‘mysql3:3306’ is not valid for Cluster usage.

The following issues were encountered:

 – Some configuration options need to be fixed.

+———————————-+—————+—————-+————————————————–+
| Variable                         | Current Value | Required Value | Note                                             |
+———————————-+—————+—————-+————————————————–+
| binlog_checksum                  | CRC32         | NONE           | Update the server variable or restart the server |
| master_info_repository           | FILE          | TABLE          | Restart the server                               |
| relay_log_info_repository        | FILE          | TABLE          | Restart the server                               |
| transaction_write_set_extraction | OFF           | XXHASH64       | Restart the server                               |
+———————————-+—————+—————-+————————————————–+

Please fix these issues , restart the serverand try again.

{
    “config_errors”: [
        {
            “action”: “server_update”,
            “current”: “CRC32”,
            “option”: “binlog_checksum”,
            “required”: “NONE”
        },
        {
            “action”: “restart”,
            “current”: “FILE”,
            “option”: “master_info_repository”,
            “required”: “TABLE”
        },
        {
            “action”: “restart”,
            “current”: “FILE”,
            “option”: “relay_log_info_repository”,
            “required”: “TABLE”
        },
        {
            “action”: “restart”,
            “current”: “OFF”,
            “option”: “transaction_write_set_extraction”,
            “required”: “XXHASH64”
        }
    ],
    “errors”: [],
    “restart_required”: true,
    “status”: “error”
}
during this process, the configuration if parsed to see if all required settings are present.
By default, some settings are missing or need to be changed, we can ask the shell to perform the changes for us:
mysql-js> dba.configureLocalInstance()
Please provide the password for ‘root@localhost:3306’:

Detecting the configuration file…
Found configuration file at standard location: /etc/my.cnf

Do you want to modify this file? [Y|n]: y
Validating instance…

The configuration has been updated but it is required to restart the server.

{
    “config_errors”: [
        {
            “action”: “restart”,
            “current”: “FILE”,
            “option”: “master_info_repository”,
            “required”: “TABLE”
        },
        {
            “action”: “restart”,
            “current”: “FILE”,
            “option”: “relay_log_info_repository”,
            “required”: “TABLE”
        },
        {
            “action”: “restart”,
            “current”: “OFF”,
            “option”: “transaction_write_set_extraction”,
            “required”: “XXHASH64”
        }
    ],
    “errors”: [],
    “restart_required”: true,
    “status”: “error”
}
This command works only if you plan to modify the configuration of the local instance (as the name of the function tells it). So when you need to configure multiple members of a cluster, you need to connect to each nodes independently and locally to them.
As the command returned it, we need now to restart mysqld to enable the new configuration settings:
[mysql3 ~]# systemctl restart mysqld
We can now connect again with the shell, verify again the configuration and finally create the cluster:
mysql-js> \c root@mysql3:3306
Creating a Session to ‘root@mysql3:3306’
Enter password:
Your MySQL connection id is 6
Server version: 5.7.18-log MySQL Community Server (GPL)
No default schema selected; type \use <schema> to set one.
mysql-js> dba.checkInstanceConfiguration(‘root@mysql3:3306’)
Please provide the password for ‘root@mysql3:3306’:
Validating instance…

The instance ‘mysql3:3306’ is valid for Cluster usage
{
    “status”: “ok”
}
mysql-js> cluster = dba.createCluster(‘MyInnoDBCluster’)
A new InnoDB cluster will be created on instance ‘root@mysql3:3306’.

Creating InnoDB cluster ‘MyInnoDBCluster’ on ‘root@mysql3:3306’…
Adding Seed Instance…

Cluster successfully created. Use Cluster.addInstance() to add MySQL instances.
At least 3 instances are needed for the cluster to be able to withstand up to
one server failure.

<Cluster:MyInnoDBCluster>
We can verify the status of our single node cluster using once again the shell:
mysql-js> cluster.status()
{
    “clusterName”: “MyInnoDBCluster”,
    “defaultReplicaSet”: {
        “name”: “default”,
        “primary”: “mysql3:3306”,
        “status”: “OK_NO_TOLERANCE”,
        “statusText”: “Cluster is NOT tolerant to any failures.”,
        “topology”: {
            “mysql3:3306”: {
                “address”: “mysql3:3306”,
                “mode”: “R/W”,
                “readReplicas”: {},
                “role”: “HA”,
                “status”: “ONLINE”
            }
        }
    }
}

We have one cluster that is running but which is of course not tolerant to any failure.
This is how our architecture looks like now:

6. adding instances to the cluster
The goal is to have a Cluster of 3 nodes (or a Group of 3 Members). Now we will add mysql4 using the same backup we used for mysql3 and using the same procedure.
transfer:
[mysql1 ~]# scp -r /tmp/backup mysql4:/tmp
restore:
[mysql4 ~]# systemctl stop mysqld
[mysql4 ~]# mysqlbackup –backup-dir=/tmp/backup –force copy-back
[mysql4 ~]# rm /var/lib/mysql/mysql*-bin*  # just some cleanup
[mysql4 ~]# chown -R mysql. /var/lib/mysql
This time, no need to modify the configuration manually, we will later use the shell for that. So we can simply start mysqld:
[mysql4 ~]# systemctl start mysqld
Let’s use the shell to join the Group:
mysql-js> \c root@mysql3:3306
Creating a Session to ‘root@mysql3:3306’
Enter password:
Your MySQL connection id is 27
Server version: 5.7.18-log MySQL Community Server (GPL)
No default schema selected; type \use <schema> to set on
mysql-js> dba.checkInstanceConfiguration(‘root@mysql4:3306’)
Please provide the password for ‘root@mysql4:3306’:
Validating instance…
The instance ‘mysql4:3306’ is not valid for Cluster usage.
The following issues were encountered:
– Some configuration options need to be fixed.
+———————————-+—————+—————-+————————————————–+
| Variable                         | Current Value | Required Value | Note                                             |
+———————————-+—————+—————-+————————————————–+
| binlog_checksum                  | CRC32         | NONE           | Update the server variable or restart the server |
| enforce_gtid_consistency         | OFF           | ON             | Restart the server                               |
| gtid_mode                        | OFF           | ON             | Restart the server                               |
| log_bin                          | 0             | 1              | Restart the server                               |
| log_slave_updates                | 0             | ON             | Restart the server                               |
| master_info_repository           | FILE          | TABLE          | Restart the server                               |
| relay_log_info_repository        | FILE          | TABLE          | Restart the server                               |
| transaction_write_set_extraction | OFF           | XXHASH64       | Restart the server                               |
+———————————-+—————+—————-+————————————————–+

Please fix these issues , restart the server and try again.
{
“config_errors”: [
{
“action”: “server_update”,
“current”: “CRC32”,
“option”: “binlog_checksum”,
“required”: “NONE”
},
{
“action”: “restart”,
“current”: “OFF”,
“option”: “enforce_gtid_consistency”,
“required”: “ON”
},
{
“action”: “restart”,
“current”: “OFF”,
“option”: “gtid_mode”,
“required”: “ON”
},
{
“action”: “restart”,
“current”: “0”,
“option”: “log_bin”,
“required”: “1”
},
{
“action”: “restart”,
“current”: “0”,
“option”: “log_slave_updates”,
“required”: “ON”
},
{
“action”: “restart”,
“current”: “FILE”,
“option”: “master_info_repository”,
“required”: “TABLE”
},
{
“action”: “restart”,
“current”: “FILE”,
“option”: “relay_log_info_repository”,
“required”: “TABLE”
},
{
“action”: “restart”,
“current”: “OFF”,
“option”: “transaction_write_set_extraction”,
“required”: “XXHASH64”
}
],
“errors”: [],
“restart_required”: true,
“status”: “error”
}

We can let the shell configure it:
mysql-js> dba.configureLocalInstance()
Please provide the password for ‘root@localhost:3306’:

Detecting the configuration file…
Found configuration file at standard location: /etc/my.cnf

Do you want to modify this file? [Y|n]: y
Validating instance…

The configuration has been updated but it is required to restart the server.

{
    “config_errors”: [
        {
            “action”: “restart”,
            “current”: “OFF”,
            “option”: “enforce_gtid_consistency”,
            “required”: “ON”
        },
        {
            “action”: “restart”,
            “current”: “OFF”,
            “option”: “gtid_mode”,
            “required”: “ON”
        },
        {
            “action”: “restart”,
            “current”: “0”,
            “option”: “log_bin”,
            “required”: “1”
        },
        {
            “action”: “restart”,
            “current”: “0”,
            “option”: “log_slave_updates”,
            “required”: “ON”
        },
        {
            “action”: “restart”,
            “current”: “FILE”,
            “option”: “master_info_repository”,
            “required”: “TABLE”
        },
        {
            “action”: “restart”,
            “current”: “FILE”,
            “option”: “relay_log_info_repository”,
            “required”: “TABLE”
        },
        {
            “action”: “restart”,
            “current”: “OFF”,
            “option”: “transaction_write_set_extraction”,
            “required”: “XXHASH64”
        }
    ],
    “errors”: [],
    “restart_required”: true,
    “status”: “error”
}
Restart the service to enable the changes:
[mysql4 ~]# systemctl restart mysqld
We will use again the same purged GTIDs as previously (remember it’s in /tmp/backup/meta/backup_gtid_executed.sql):
mysql-js> \c root@mysql4:3306
mysql-js> \sql
mysql-sql> RESET MASTER;
mysql-sql> SET global gtid_purged=”33351000-3fe8-11e7-80b3-08002718d305:1-1002″;
We used the shell to illustrate how SQL mode can be used with it too.
We are ready to add mysql4 to the Group:
mysql-sql> \js
mysql-js> dba.checkInstanceConfiguration(‘root@mysql4:3306’)
Please provide the password for ‘root@mysql4:3306’:
Validating instance…

The instance ‘mysql4:3306’ is valid for Cluster usage
{
“status”: “ok”
}

Now we need to connect to a node that is already member of the group to load the cluster object (get the metadata of the cluster):
mysql-js> \c root@mysql3:3306
Creating a Session to ‘root@mysql3:3306’
Enter password:
Your MySQL connection id is 29
Server version: 5.7.18-log MySQL Community Server (GPL)
No default schema selected; type \use <schema> to set one.
mysql-js> cluster = dba.getCluster()
<Cluster:MyInnoDBCluster>
Now we can check if the node that we want to add is consistent with the transactions that have been applied (verify the GTIDs):
mysql-js> cluster.checkInstanceState(‘root@mysql4:3306’)
Please provide the password for ‘root@mysql4:3306’:
Analyzing the instance replication state…

The instance ‘mysql4:3306’ is valid for the cluster.
The instance is fully recoverable.

{
“reason”: “recoverable”,
“state”: “ok”
}

This is perfect, we can add the new member (mysql4):
mysql-js> cluster.addInstance(“root@mysql4:3306”)
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Please provide the password for ‘root@mysql4:3306’:
Adding instance to the cluster …

The instance ‘root@mysql4:3306’ was successfully added to the cluster.
Let’s verify this:
mysql-js> cluster.status()
{
    “clusterName”: “MyInnoDBCluster”,
    “defaultReplicaSet”: {
        “name”: “default”,
        “primary”: “mysql3:3306”,
        “status”: “OK_NO_TOLERANCE”,
        “statusText”: “Cluster is NOT tolerant to any failures.”,
        “topology”: {
            “mysql3:3306”: {
                “address”: “mysql3:3306”,
                “mode”: “R/W”,
                “readReplicas”: {},
                “role”: “HA”,
                “status”: “ONLINE”
            },
            “mysql4:3306”: {
                “address”: “mysql4:3306”,
                “mode”: “R/O”,
                “readReplicas”: {},
                “role”: “HA”,
                “status”: “ONLINE”
            }
        }
    }
}
Great !
Now let’s add mysql2, where we don’t need any backup as the data is already present.
The first step on mysql2 is to stop the running slave thread(s) (io and sql) and then forget completely about this asynchronous replication:
mysql2 mysql> stop slave;
mysql2 mysql> reset slave all;
It’s the right moment to add it on the cluster using mysql shell, first we need to check the configuration:
[mysql2 ~]# mysqlsh
mysql-js> \c root@mysql2:3306
mysql-js> dba.checkInstanceConfiguration(‘root@mysql2:3306’)
mysql-js> dba.configureLocalInstance()
[mysql2 ~]# systemctl restart mysqld
and now we can add mysql2 to the cluster:
[mysql2 ~]# mysqlsh
mysql-js> \c root@mysql3:3306
mysql-js> dba.checkInstanceConfiguration(‘root@mysql2:3306’)
mysql-js> cluster = dba.getCluster()
mysql-js> cluster.addInstance(“root@mysql2:3306”)
mysql-js> cluster.status()
{
    “clusterName”: “MyInnoDBCluster”,
    “defaultReplicaSet”: {
        “name”: “default”,
        “primary”: “mysql3:3306”,
        “status”: “OK”,
        “statusText”: “Cluster is ONLINE and can tolerate up to ONE failure.”,
        “topology”: {
            “mysql2:3306”: {
                “address”: “mysql2:3306”,
                “mode”: “R/O”,
                “readReplicas”: {},
                “role”: “HA”,
                “status”: “ONLINE”
            },
            “mysql3:3306”: {
                “address”: “mysql3:3306”,
                “mode”: “R/W”,
                “readReplicas”: {},
                “role”: “HA”,
                “status”: “ONLINE”
            },
            “mysql4:3306”: {
                “address”: “mysql4:3306”,
                “mode”: “R/O”,
                “readReplicas”: {},
                “role”: “HA”,
                “status”: “ONLINE”
            }
        }
    }
}
It’s possible to run dba.configureLocalInstance() on a running node to add the group replication settings to my.cnf
7. configure the router
At this point the architecture looks like this:

Let’s configure mysql-router on mysql1, in fact, the router has the capability to bootstrap itself using the cluster’s metadata. It only needs to access one of the member:
[root@mysql1 ~]# mysqlrouter –bootstrap mysql3:3306 –user mysqlrouter
Please enter MySQL password for root:
WARNING: The MySQL server does not have SSL configured and metadata used by the router may be transmitted unencrypted.

Bootstrapping system MySQL Router instance…
MySQL Router  has now been configured for the InnoDB cluster ‘MyInnoDBCluster’.

The following connection information can be used to connect to the cluster.

Classic MySQL protocol connections to cluster ‘MyInnoDBCluster’:
– Read/Write Connections: localhost:6446
– Read/Only Connections: localhost:6447

X protocol connections to cluster ‘MyInnoDBCluster’:
– Read/Write Connections: localhost:64460
– Read/Only Connections: localhost:64470
The configuration is done and the router will listen on 4 ports after starting it:
[mysql1 ~]# systemctl start mysqlrouter
8. test phase
Now you can check your cluster, how fast it can process the replication, test some read queries, etc…
As soon as you are happy with your test, you just need to point the application to the router and it’s done !
9. pointing the application to the new solution
Pointing the application to the router is the only downtime in our story which is very fast.
This is the final architecture:

Enjoy MySQL InnoDB Cluster !
 
 

Sysbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8

In this post I share results from sysbench with an IO-bound workload on Intel NUC servers. My previous post shared results from an in-memory workload. This is part of my work from bug 86215 as I identify CPU performance regressions from MySQL 5.6 to 5.7 and 8.tl;dr
The results here are similar to, but not as bad as, the results from the in-memory workload. This is reasonable because CPU regressions can be hidden by IO-bound tests. But if you scroll to the bottom of this page and look at the graph for the point-query test you will see there are problems even for IO-bound tests.
For the i5 NUC most of the regression is from 5.6 to 5.7
For the i3 NUC, MySQL 5.7 did better especially on range scans but there is still a regression from 5.6 to 8. From many tests I have run it looks like someone did great work in MySQL 5.7 to make range scans more efficient in InnoDB.
For long scans the overhead from the default charset/collation in MySQL 8 is significant compared to latin1/latin1_swedish_ci.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I repeated the test: first with the default charset/collation and then with latin1/latin1_swedish_ci.
My servers are described here. The charts below refer to the servers as i3 NUC and i5 NUC. Compared to the i3 NUC, the i5 NUC is newer, has a CPU that is 2X faster, 2X more RAM and storage that is more than 2X faster.My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 40M rows per table. The database is larger than RAM. Although in future tests I might need to use larger tables for the i5 NUC which has twice the memory of the i3 NUC.ResultsI first ran these tests on the i3 NUC and then was curious about performance on a more modern CPU so I setup a cluster of new i5 NUC servers. I am a big fan of Intel NUC and have 6 at home and have yet to trip a circuit breaker, overheat my office or suffer from too much fan noise. The results on the i5 NUC are different than on the i3 NUC. On the i3 NUC MySQL 5.7 has much less of a regression from 5.6. While on the i5 NUC most of the regression is from MySQL 5.6 to 5.7. Perhaps one day I will have time to explain that, but I am too busy right now.The results are in the order in which tests are run.There is a small regression from 5.6 to 5.7/8 for update-index on the i5 NUC. This test requires secondary index maintenance as part of the update (read-modify-write of secondary index leaf pages). 
The regression for i5 NUC for update-nonindex is larger than for update-index. This test is less IO-bound than update-index because secondary index maintenance is not needed.The regression for delete is worse on the i5 NUC. If fact, there isn’t a regression on the i3 NUC.
The regression for write-only is worse on the i5 NUC and there isn’t a regression on the i3 NUC.For read-write the regression is worse on the i5 NUC. MySQL 5.7 does better than 5.6 on the i3 NUC and that difference is larger for the larger range scan (10,000 row) than the smaller (100 row).
For read-only using latin1/latin1_swedish_ci improves QPS at 1000 and 10,000 row range scans with MySQL 8 compared to the default charset/collation. Again, MySQL 5.7 does great on the i3 NUC and does better as the range scan increases. MySQL 8 has regressions on both the i3 and i5 NUC and that isn’t explained by charset.

For point-query the regression on the i3 NUC starts with MySQL 8 and on the i5 NUC starts with MySQL 5.7.For insert the regression is larger on the i5 NUC.

Kafka Replication from MySQL and Oracle

Hello again everybody.
Well, I promised it a couple of weeks ago, and I’m sorry it has been so long (I’ve been working on other fun stuff in addition to this). But I’m pleased to say that we now have a fully working applier that takes data from an incoming THL stream, whether that is Oracle or MySQL, and converts that into a JSON document and message for distribution over a Kafka topic.
Currently, the configuration is organised with the following parameters:

The topic name is set according to the incoming schema and table. You can optionally add a prefix. So, for example, if you have a table ‘invoices’ in the schema ‘sales’, your Kafka topic will be sales_invoices, or if you’ve added a prefix, ‘myprefix_schema_table’.
Data is marshalled into a JSON document as part of the message, and the structure is to have a bunch of metadata and then an embedded record. You’ll see an example of this below. You can choose what metadata is included here. You can also choose to send everything on a single topic. I’m open to suggestions on whether it would be useful for this to be configured on a more granular level.
The msgkey is composed of the primary key information (if we can determine it), or the sequence number otherwise.
Messages are generated one row of source data at a time. There were lots of ways we could have done this, especially with larger single dumps/imports/multi-million-row transactions. There is no more sensible way. It may mean we get duplicate messages into Kafka, but these are potentially easier to handle than trying to send a massive 10GB Kafka message.
Since Zookeeper is a requirement for Kafka, we use Zookeeper to record the replicator status information.

Side note: One way I might consider mitigating that last item (and which may also apply to some of our other upcoming appliers, such as the ElasticSearch applier) is to actually change the incoming THL stream so that it is split into individual rows. This sounds entirely crazy, since it would separate the incoming THL sequence number from the source (MySQL binlog, Oracle, er, other upcoming extractors), but it would mean that we have THL on the applier side which is a single row of data. That means we would have a THL seqno per row of data, but would also mean that in the event of a problem, the replicator could restart from that one row of data, rather than restarting from the beginning of a multi-million-row transaction.
Anyway, what does it all look like in practice?
Well, here’s a simple MySQL instance and I’m going to insert a row into this table:
mysql> insert into sbtest.sbtest values (0,100,”Base Msg”,”Some other submsg”);
OK, this looks like this:
mysql> select * from sbtest.sbtest where k = 100;
+——–+—–+———-+——————-+
| id     | k   | c        | pad               |
+——–+—–+———-+——————-+
| 255759 | 100 | Base Msg | Some other submsg |
+——–+—–+———-+——————-+
Over in Kafka, let’s have a look what the message looks like. I’m just using the console consumer here:
{“_meta_optype”:”INSERT”,”_meta_committime”:”2017-05-27 14:27:18.0″,”record”:{“pad”:”Some other submsg”,”c”:”Base Msg”,”id”:”255759″,”k”:”100″},”_meta_source_table”:”sbtest”,”_meta_source_schema”:”sbtest”,”_meta_seqno”:”10130″}
And let’s reformat that into something more useful:
{
   “_meta_committime” : “2017-05-27 14:27:18.0”,
   “_meta_source_schema” : “sbtest”,
   “_meta_seqno” : “10130”,
   “_meta_source_table” : “sbtest”,
   “_meta_optype” : “INSERT”,
   “record” : {
      “c” : “Base Msg”,
      “k” : “100”,
      “id” : “255759”,
      “pad” : “Some other submsg”
   }
}
 
Woohoo! Kafka JSON message. We’ve got the metadata (and those field names/prefixes are likely to change), but we’ve also got the full record details. I’m still testing other data types and ensuring we get the data through correctly, but I don’t foresee any problems.
There are a couple of niggles still to be resolved:

The Zookeeper interface which is used to store state data needs addressing; although it works fine there are some occasional issues with key/path collisions.
Zookeeper and Kafka connection are not fully checked, so it’s possible to appear to be up and running when no connection is available.
Some further tweaking of the configuration would be helpful – for example, setting or implying specific formats for msg key and the embedded data.

I may add further configurations for other items, especially since longer term we might have a Kafka extractor and maybe we want to use that to distribute records, in which case we might want to track other information like the additional metadata and configuration (SQL mode etc) currently held within the THL. I’ll keep thinking about that though.
Anything else people would like to see here? Please email me at mc.brown@continuent.com and we’ll sort something out.
 

Oracle一体机SR-IOV技术运用

现今,在x86体系架构下,有三种常见的网络I/O虚拟化方式:

  •     基于虚拟化软件(比如OVM用到的Xen)方式的I/O共享
  •     直接设备指派
  •     基于硬件的SR-IOV(Single-Root I/O Virtualization)

基于软件方式的I/O共享

这是人们最常想到的,也是现今基于x86体系架构商业化虚拟化产品最常用到的方式。服务器虚拟化软件(Hypervisor, 比如Xen)位于虚拟机操作系统设备驱动器和实际的服务器硬件之间。简单说,虚拟机操作系统设备驱动器需要先与虚拟化软件交互,然后虚拟化软件再与底层的物理硬件交互来完成I、O操作。这种方式的好处是允许在多个虚拟机之间共享同一个物理设备。因为虚拟化软件使虚拟机操作系统相信他们拥有自己的硬件设备,而实际上他们是共享了同一个物理设备。虚拟化软件完成所有这些工作 – 管理冲突,确保所有虚拟机并发访问I/O设备,确保稳定运行。这种方式的缺点是虚拟化软件是以牺牲一定的性能来提供该功能的:虚拟化层至少需要一些时间来处理流量控制和I/O转换。来自于虚拟机的并发I/O操作越多,虚拟化软件会越忙碌,这会对系统的稳定性造成潜在的影响。虽然随着现今虚拟化软硬件的发展,基于软件方式的I/O共享对系统的性能和稳定性影响很小,但是在一些对系统性能有着极端高度要求的场景下,这种方式对性能的影响还是显而易见的。

直接设备指派

不同于基于软件的I/O共享 – 每个虚拟机与虚拟化软件直接通讯,然后虚拟化软件再与物理硬件设备通讯,直接设备指派方式允许虚拟机操作系统直接访问物理I/O设备硬件,不需要先通过虚拟化软件。这种方式清除了虚拟化软件层面造成的性能瓶颈和潜在的不稳定性,通过将I/O设备直接指派给一个虚拟机,确保虚拟机能够获得可预期的和稳定的I/O性能,就如同裸机操作系统访问自己的设备一样。使用这种方式的缺点是由于虚拟化层被略过,导致虚拟化软件无法完成虚拟化I/O的功能,也就是实现多个虚拟机共享一个I/O设备。如果你有一个物理机只有两个网卡,你最多在上边只能跑两个虚拟机, 每个虚拟机给指派一个网卡,这种方式不是很经济。

理想的解决方案是提供一种方式,既能实现虚拟机共享同一个物理I/O设备, 也能实现像直接设备指派方式那样提供高性能, 高稳定性的I/O方式。而SR-IOV就是为了实现这种功能,并引入到Oracle使用虚拟化的一体机系统中。

Single-Root I/O Virtualization

SR-IOV是基于硬件的I/O虚拟化解决方案,提供更高的I/O性能和可扩展性。SR-IOV需要硬件的支持,通过在网卡上定义多个轻量级的VF(Virtual Function), 然后将VF指派给虚拟机,可以实现虚拟机绕过虚拟化软件直接访问网卡,同时又共享同一个物理网卡设备。

SR-IOV是一种新的业界标准,PCI-SIG 组织负责定义和维护SR-IOV 规范,可以通过如下网址访问:

http://www.pcisig.com.

在SR-IOV设备中,有两种功能(Function )类型:

  • Physical Function (PF)

       PF具有完整的PCIe功能, 能够被发现,管理和操作,就像操作任何其他普通PCIe设备一样。PF拥有完整的配置空间。可以通过PF来配置和控制PCIe设备,PF拥有完整的能力将数据从PCIe设备移入和移出。每一个PCIe设备可以有一个到最多八个PF。每个PF是独立的,而且从软件的角度看,每个PF就是一个独立的PCIe设备。

  • Virtual Function (VF)

       VF是一个轻量级的PCIe function,仅用于将数据移入和移出设备。每一个VF与一个底层的PF关联,每个PF可以有0个或多个VF。VF通过PF来创建。每一个VF可以指派给一个虚拟机, 从虚拟机的角度来说一个VF就是一个本地的物理网络设备。

所以通过SR-IOV,在硬件网络I/O设备层面实现在一个物理设备创建出多个虚拟设备(VF),每一个VF指派给一个虚拟机,虚拟机在直接访问VF的时候,就相当于直接访问硬件,可以达到类似于主机访问本地设备的性能。在Oracle一体机上,每个InfiniBand 网卡可以最多被63个虚拟机共享,也就是支持63个VF。

 

DNS Infrastructure at GitHub

At GitHub we recently revamped how we do DNS from the ground up. This included both how we interact with external DNS providers and how we serve records internally to our hosts. To do this, we had to design and build a new DNS infrastructure that could scale with GitHub’s growth and across many data centers.

Previously GitHub’s DNS infrastructure was fairly simple and straightforward. It included a local, forwarding only DNS cache on every server and a pair of hosts that acted as both caches and authorities used by all these hosts. These hosts were available both on the internal network as well as public internet. We configured zone stubs in the caching daemon to direct queries locally rather than recurse on the internet. We also had NS records set up at our DNS providers that pointed specific internal zones to the public IPs of this pair of hosts for queries external to our network.

This configuration worked for many years but was not without its downsides. Many applications are highly sensitive to resolving DNS queries and any performance or availability issues we ran into would cause queuing and degraded performance at best and customer impacting outages at worst. Configuration and code changes can cause large unexpected changes in query rates. As such scaling beyond these two hosts became an issue. Due to the network configuration of these hosts we would just need to keep adding IPs and hosts which has its own problems. While attempting to fire fight and remediate these issues, the old system made it difficult to identify causes due to a lack of metrics and visibility. In many cases we resorted to tcpdump to identify traffic and queries in question. Another issue was running on public DNS servers we run the risk of leaking internal network information. As a result we decided to build something better and began to identify our requirements for the new system.

We set out to design a new DNS infrastructure that would improve the aforementioned operational issues including scaling and visibility, as well as introducing some additional requirements. We wanted to continue to run our public DNS zones via external DNS providers so whatever system we build needed to be vendor agnostic. Additionally, we wanted this system to be capable of serving both our internal and external zones, meaning internal zones were only available on our internal network unless specifically configured otherwise and external zones are resolvable without leaving our internal network. We wanted the new DNS architecture to allow both a deploy-based workflow for making changes as well as API access to our records for automated changes via our inventory and provisioning systems. The new system could not have any external dependencies, too much relies on DNS functioning for it to get caught in a cascading failure. This includes connectivity to other data centers and DNS services that may reside there. Our old system mixed the use of caches and authorities on the same host, we wanted to move to a tiered design with isolated roles. Lastly, we wanted a system that could support many data center environments whether it be EC2 or bare metal.

Implementation

To build this system we identified three classes of hosts: caches, edges, and authorities. Caches serve as recursive resolvers and DNS “routers” caching responses from the edge tier. The edge tier, running a DNS authority daemon, responds to queries from the caching tier for zones it is configured to zone transfer from the authority tier. The authority tier serve as hidden DNS masters as our canonical source for DNS data, servicing zone transfers from the edge hosts as well as providing an HTTP API for creating, modifying or deleting records.

In our new configuration, caches live in each data center meaning application hosts don’t need to traverse a data center boundary to retrieve a record. The caches are configured to map zones to the edge hosts within their region in order to route our internal zones to our own hosts. Any zone that is not explicitly configured will recurse on the internet to resolve an answer.

The edge hosts are regional hosts, living in our network edge PoPs (Point of Presence). Our PoPs have one or more data centers that rely on them for external connectivity, without the PoP the data center can’t get to the internet and the internet can’t get to them. The edges perform zone transfers with all authorities regardless of what region or location they exist in and store those zones locally on their disk.

Our authorities are also regional hosts, only containing zones applicable to the region it is contained in. Our inventory and provisioning systems determine which regional authority a zone lives in and will create and delete records via an HTTP API as servers come and go. OctoDNS maps zones to regional authorities and uses the same API to create static records and to ensure dynamic sources are in sync. We have an additional separate authority for external domains, such as github.com, to allow us to query our external domains during a disruption to connectivity. All records are stored in MySQL.

Operability

One huge benefit of moving to a more modern DNS infrastructure is observability. Our old DNS system had little to no metrics and limited logging. A large factor in deciding which DNS servers to use was the breadth and depth of metrics they produce. We finalized on Unbound for the caches, NSD for the edge hosts and PowerDNS for the authorities, all of which have been proven in DNS infrastructures much larger than at GitHub.

When running in our bare metal data centers, caches are accessed via a private anycast IP resulting in it reaching the nearest available cache host. The caches have been deployed in a rack aware manner that provides some level of balanced load between them and isolation against some power and network failure modes. When a cache host fails, servers that would normally use it for lookups will now automatically be routed to the next closest cache, keeping latency low as well as providing tolerance to some failure modes. Anycast allows us to scale the number of caches behind a single IP address unlike our previous configuration, giving us the ability to run as many caching hosts as DNS demand requires.

Edge hosts perform zone transfers with the authority tier, regardless of region or location. Our zones are not large enough that keeping a copy of all of them in every region is a problem. This means for every zone, all caches will have access to a local edge server with a local copy of all zones even when a region is offline or upstream providers are having connectivity issues. This change alone has proven to be quite resilient in the face of connectivity issues and has helped keep GitHub available during failures that not long ago would have caused customer facing outages.

These zone transfers include both our internal and external zones from their respective authorities. As you might guess zones like github.com are external and zones like github.net are generally internal. The difference between them is only the types of use and data stored in them. Knowing which zones are internal and external gives us some flexibility in our configuration.

$ dig +short github.com
192.30.253.112
192.30.253.113

Public zones are sync’d to external DNS providers and are records GitHub users use everyday. Addtionally, public zones are completely resolvable within our network without needing to communicate with our external providers. This means any service that needs to look up api.github.com can do so without needing to rely on external network connectivity. We also use the stub-first configuration option of Unbound which gives a lookup a second chance if our internal DNS service is down for some reason by looking it up externally when it fails.

$ dig +short time.github.net
10.127.6.10

Most of the github.net zone is completely private, inaccessible from the internet and only contains RFC 1918 IP addresses. Private zones are split up per region and site. Each region and/or site has a set of sub-zones applicable to that location, sub-zones for management network, service discovery, specific service records and yet to be provisioned hosts that are in our inventory. Private zones also include reverse lookup zones for PTRs.

Conclusion

Replacing an old system with a new one that is ready to serve millions of customers is never easy. Using a pragmatic, requirements based approach to designing and implementing our new DNS system resulted in a DNS infrastructure that was able to hit the ground running and will hopefully grow with GitHub into the future.

Want to help the GitHub SRE team solve interesting problems like this? We’d love for you to join us. Apply Here

Three Reasons Becoming an Oracle Certified Master Should Be On Your Radar

Our program has three core levels, Oracle Certified Associate, Oracle Certified Professional and Oracle Certified Master.  Each level represents a set of skills and capabilities to a defined job level.  

  • The Oracle Certified Associate is our entry level certification and demonstrates fundamental skills and foundational knowledge.  
  • The Oracle Certified Professional proves advanced level skill and can apply strong knowledge.  
  • The Oracle Certified Master performs hands-on tasks to solve real-world problems in a performance based environment.

Oracle Certified Masters can be either Database technologists or Java Enterprise Architects.

So why do we think reaching the highest level of certification should be on your radar?

  1. The certification speaks for itself.
  2. Join and collaborate with other certified masters whom you share a common interest and dedication with.
  3. We’re proud of these folks that have invested a lot of their time to achieve our highest level of certification.  View the Oracle Certified Masters here who have added their profile to our Community page.

If that’s not reason enough, hear what other IT industry certification vendors have to say about Earning the Hardest Tech Certifications in this article published by Insight in January 2017.  All agree that experience or training alone doesn’t stack up to possessing a stamp that signifies your readiness to perform at the highest level.  

Oracle’s Yvonne Prefointaine (Senior Manager, Java Exam Development) states “At Oracle, the Holy Grail of certifications goes by another name: the Oracle Certified Master/Java Enterprise Architect. “While Oracle recommends industry experience as being a critical component to obtaining a passing score, we do not require it,” said Yvonne Préfontaine, Senior Manager of Oracle Certification Exam Development, replying via e-mail. “Nonetheless, if a candidate does not have considerable programming skill developing EE solutions, it will be immediately noticed.”

What level of skill do you want to be noticed for?

Someone famous once said ‘it’s the journey not the destination‘ although in this case I wouldn’t agree…whichever path you are on, getting there is only half the fun.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569