missing manual

Why Thread May Hang in “Waiting for table level lock” State – Part I

Last time I had to use gdb to check what’s going on in MySQL server and found something useful with it to share in the blog post it was April 2017, and I miss this kind of experience already. So, today I decided to try it again in a hope to get some insights in cases when other tools may not apply or may not be as helpful as one could expect. Here is the long enough story, based on recent customer issue I worked on this week. * * * Had you seen anything like this output of SHOW PROCESSLIST statement:Id User Host db Command Time State Info Progress…28 someuser01 xx.xx.xx.xx:39644 somedb001 Sleep 247 NULL 0.00029 someuser01 xx.xx.xx.yy:44100 somedb001 Query 276 Waiting for table level lock DELETE FROM t1 WHERE (some_id = ‘NNNNN’) AND … 0.000…33 someuser01 xx.xx.zz.tt:35886 somedb001 Query 275 Waiting for table level lock DELETE FROM t2 WHERE (some_id = ‘XXXXX’) 0.000…38 someuser01 xx.xx.xx.xx:57055 somedb001 Query 246 Waiting for table level lock DELETE FROM t3 WHERE (some_id in (123456)) AND … 0.000…recently? That is, many threads accessing InnoDB(!) tables and hanging in the “Waiting for table level lock” state for a long time without any obvious reason in the SHOW PROCESSLIST or SHOW ENGINE INNODB STATUS?I’ve seen it this week, along with a statement from customer that the server is basically unusable. Unlike in many other cases I had not found any obvious problem from the outputs mentioned above (other than high concurrency and normal InnoDB row lock waits etc for other sessions). There were no unexpected table level locks reported by InnoDB, and the longest running transactions were those in that waiting state.Unfortunately fine MySQL manual is not very helpful when describing this thread state:”Waiting for lock_type lock The server is waiting to acquire a THR_LOCK lock or a lock from the metadata locking subsystem, where lock_type indicates the type of lock. This state indicates a wait for a THR_LOCK: Waiting for table level lock

These states indicate a wait for a metadata lock:
Waiting for event metadata lock

Waiting for global read lock…”

What that “THR_LOCK” should tell the reader is beyond me (and probably most of MySQL users). Fortunately, I know from experience that this state usually means that thread is trying to access some MyISAM table. The tables mentioned in currently running statements were all InnoDB, though. Server had performance_schema enabled, but only with default instrumentation and consumers defined, so I had no chance to get anything related to metadata locks or much hope to see all previous statement executed by each thread anyway. I surely insisted on checking the source code etc, but this rarely gives immediate results.So, to check my assumption and try to prove MyISAM tables are involved, and get some hint what sessions they were locked by, I suggested to run mysqladmin debug, the command that is rarely used these days, with the output that Oracle explicitly refused to document properly (see my Bug #71346)! I do know what to expect there (from the days when MyISAM was widely used), and when in doubts I can always check the source code.The mysqladmin debug command outputs a lot of information into the error log. Among other things I found the following section:Thread database.table_name Locked/Waiting Lock_type…928 somedb001.somewhat_seq Waiting – write High priority write lock…12940 somedb001.somewhat_seq Locked – write High priority write lockwhere somedb001.somewhat_seq surely was the MyISAM table. There were actually many tables mentioned as locked, with waits on them, and names ending with _seq suffix. One can now check what MySQL thread with id=12940 is doing at all levels, where it comes from, should it be killed etc.Now it became more clear what’s going on. Probably application developers tried to implement sequences with MyISAM tables and, probably, triggers to use them when data are not provided for the columns! The idea is quite popular and described in many blog posts. Check this old blog post by Peter Zaitsev, for example. This assumption was confirmed, and essentially they did something like I did in the following deliberately primitive example:– this is our sequence tablemysql> show create table misam\G*************************** 1. row ***************************       Table: misamCreate Table: CREATE TABLE `misam` (  `id` int(11) NOT NULL,  PRIMARY KEY (`id`)) ENGINE=MyISAM DEFAULT CHARSET=utf8mb41 row in set (0.00 sec)mysql> create table inno like misam;Query OK, 0 rows affected (0.30 sec)mysql> alter table inno engine=InnoDB;Query OK, 0 rows affected (2.73 sec)Records: 0  Duplicates: 0  Warnings: 0mysql> show create table inno\G*************************** 1. row ***************************       Table: innoCreate Table: CREATE TABLE `inno` (  `id` int(11) NOT NULL,  PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb41 row in set (0.03 sec)mysql> select * from misam;Empty set (0.00 sec)– let’s set up the first value for the sequencemysql> insert into misam values(1);Query OK, 1 row affected (0.05 sec)– now let’s create trigger to insert the value from sequencemysql> delimiter //mysql> create trigger tbi before insert on inno    -> for each row    -> begin    -> if ifnull(new.id,0) = 0 then    ->   update misam set id=last_insert_id(id+1);    ->   set new.id = last_insert_id();    -> end if;    -> end    -> //Query OK, 0 rows affected (0.25 sec)mysql> delimiter ;mysql> start transaction;Query OK, 0 rows affected (0.00 sec)mysql> insert into inno values(0);Query OK, 1 row affected (0.20 sec)mysql> insert into inno values(0);Query OK, 1 row affected (0.10 sec)mysql> select * from inno;+—-+| id |+—-+|  2 ||  3 |+—-+2 rows in set (0.00 sec)mysql> select * from misam;+—-+| id |+—-+|  3 |+—-+1 row in set (0.00 sec)So, this is how it is supposed to work, and, you know, it works from two concurrent sessions without any locking noticed etc.This is the end of the real story, for now. We have a lot of things to discuss there with customer, including how to find out previous commands executed in each sessions, what useful details we can get from the performance_schema etc. The root cause was clear, but we still have to find out why the sessions are spending so much time holding the blocking locks on MyISAM tables and what we can do about that, or with architecture based on such implementation of sequences (that I kindly ask you to avoid whenever possible! Use auto_increment values, please, if you care about concurrency.) I expect more blog posts eventually inspired by that real story, but now it’s time to move to more generic technical details. * * *I have two problems with this story if I try to approach it in a more generic way.First, do we have any other way to see MyISAM table level locks, besides mysqladmin debug command? (Oracle kindly plans to replace with something, let me quote Shane Bester’s comment the bug that originates from hist internal feature request: “We need to replace it with appropriate performance_schema or information_schema tables.”)Second, how exactly to reproduce the situation customer reported? My initial attempts to get such a status with the trigger in place failed, I was not able to capture threads spending any notable time in “Waiting for table level lock” state even with concurrent single row inserts and the trigger above in place. I was thinking about explicit LOCK TABLES misam WRITE etc, but I doubt real code does use that. Fortunately, the following INSERT running from one session:mysql> insert into inno values(sleep(100));Query OK, 1 row affected (1 min 40.20 sec)allowed me to run INSERT in the other session that got blocked:mysql> insert into inno values(0);Query OK, 1 row affected (1 min 36.36 sec)and while it was blocked I’ve seen the thread state I wanted in the output of the SHOW PROCESSLIST:mysql> show processlist;+—-+——+———–+——+———+——+——————————+————————————-+———–+—————+| Id | User | Host      | db   | Command | Time | State                        | Info                                | Rows_sent | Rows_examined |+—-+——+———–+——+———+——+——————————+————————————-+———–+—————+| 16 | root | localhost | test | Query   |   17 | User sleep                   | insert into inno values(sleep(100)) |         0 |             0 || 17 | root | localhost | test | Query   |   13 | Waiting for table level lock | insert into inno values(0)          |         0 |             0 || 22 | root | localhost | test | Query   |    0 | starting                     | show processlist                    |         0 |             0 |+—-+——+———–+——+———+——+——————————+————————————-+———–+—————+3 rows in set (0.00 sec)So, I know how exactly to reproduce the problem and what it can be caused by. Any slow running single INSERT statement (caused by something complex executed in trigger, some slow function used in the list of values inserted, or, maybe just INSERT … SELECT … from the big table) will give us the desired thread state.Coming back to the first generic problem mentioned above, is there any way besides running mysqladmin debug and checking the output vs processlist etc, to identify the thread that holds MyISAM table level locks? One should expect performance_schema to help with this, at least as of 5.7 (I’ve used Percona Server 5.7.18-15 on my Ubuntu 14.04 netbook for testing today, while the original problem was on MariaDB 10.2.x. Percona is for fun, while MariaDB is for real work…). Indeed, we have table_handles there, and until recently I ignored its existence (it’s new in 5.7, and I am not even sure if MariaDB 10.2.x has it already).So, I reproduced the problem again and got the following there:mysql> select * from performance_schema.table_handles;+————-+—————+————-+———————–+—————–+—————-+—————+—————-+| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | OBJECT_INSTANCE_BEGIN | OWNER_THREAD_ID | OWNER_EVENT_ID | INTERNAL_LOCK | EXTERNAL_LOCK  |+————-+—————+————-+———————–+—————–+—————-+—————+—————-+| TABLE       | test          | t           |       140392772351024 |               0 |              0 | NULL          | NULL           || TABLE       | test          | t1          |       140392772352560 |               0 |              0 | NULL          | NULL           || TABLE       | test          | ttime       |       140392772355632 |               0 |              0 | NULL          | NULL           || TABLE       | test          | misam       |       140392772358704 |              55 |             10 | WRITE         | WRITE EXTERNAL || TABLE       | test          | misam       |       140392896981552 |              54 |             10 | WRITE         | WRITE EXTERNAL || TABLE       | test          | inno        |       140392772361776 |              55 |             10 | NULL          | WRITE EXTERNAL || TABLE       | test          | inno        |       140392896983088 |              54 |             10 | NULL          | WRITE EXTERNAL || TABLE       | test          | inno        |       140392826836016 |               0 |              0 | NULL          | NULL           || TABLE       | test          | misam       |       140392826837552 |               0 |              0 | NULL          | NULL           |+————-+—————+————-+———————–+—————–+—————-+—————+—————-+9 rows in set (0.00 sec)What I am supposed to do with the above to find out the MySQL thread id of the blocking session? I am supposed to join to performance_schema.threads table, looking for thread_id value that is the same as owner_thread_id above, 54 and 55. I get the following, note that processlist_id is what you are looking for in the processlist:mysql> select * from performance_schema.threads where thread_id in (54, 55)\G*************************** 1. row ***************************          THREAD_ID: 54               NAME: thread/sql/one_connection               TYPE: FOREGROUND     PROCESSLIST_ID: 27   PROCESSLIST_USER: root   PROCESSLIST_HOST: localhost     PROCESSLIST_DB: testPROCESSLIST_COMMAND: Sleep   PROCESSLIST_TIME: 90  PROCESSLIST_STATE: NULL   PROCESSLIST_INFO: NULL   PARENT_THREAD_ID: NULL               ROLE: NULL       INSTRUMENTED: YES            HISTORY: YES    CONNECTION_TYPE: Socket       THREAD_OS_ID: 32252*************************** 2. row ***************************          THREAD_ID: 55               NAME: thread/sql/one_connection               TYPE: FOREGROUND     PROCESSLIST_ID: 28   PROCESSLIST_USER: root   PROCESSLIST_HOST: localhost     PROCESSLIST_DB: testPROCESSLIST_COMMAND: Sleep   PROCESSLIST_TIME: 101  PROCESSLIST_STATE: NULL   PROCESSLIST_INFO: NULL   PARENT_THREAD_ID: NULL               ROLE: NULL       INSTRUMENTED: YES            HISTORY: YES    CONNECTION_TYPE: Socket       THREAD_OS_ID: 278442 rows in set (0.10 sec)I do not see a way to distinguish lock wait from actually holding the lock. All I know at the moment is that there is engine-level lock on misam table (and inno table, for that matter). If the MySQL thread id is NOT for the thread that is “Waiting for table level lock”, then it must be the thread that holds the lock! Some smart join with proper WHERE clause would let me to find out what I need directly. Maybe in one of the next parts I’ll even present it, but writing it from the top of my head in critical situation is above my current skills related to performance_schema.* * *Now, what those of us should do who do not have performance_schema enabled, or have to use version before 5.7? Or those with access to gdb and spare 5 minutes?Surely we should attach gdb to the mysqld process and, if in doubts, read some parts of the source code to know where to start. I started with the following fragments of code that were easy to find (as long as you know that COM_DEBUG command is actually sent by mysqladmin debug to server):openxs@ao756:~/git/percona-server$ grep -rni com_debug *include/mysql/plugin_audit.h.pp:160:  COM_DEBUG,include/mysql.h.pp:47:  COM_DEBUG,include/my_command.h:39:  COM_DEBUG,libmysql/libmysql.c:899:  DBUG_RETURN(simple_command(mysql,COM_DEBUG,0,0,0));rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task_debug.h:62:  “[XCOM_DEBUG] “,sql/sql_parse.cc:1884:  case COM_DEBUG:^C…# from the line highlighted above…  case COM_DEBUG:    thd->status_var.com_other++;    if (check_global_access(thd, SUPER_ACL))      break;                                    /* purecov: inspected */    mysql_print_status();    query_logger.general_log_print(thd, command, NullS);    my_eof(thd);    break;openxs@ao756:~/git/percona-server$ grep -rni mysql_print_status *sql/sql_test.h:40:void mysql_print_status();sql/sql_parse.cc:59:#include “sql_test.h”         // mysql_print_statussql/sql_parse.cc:1888:    mysql_print_status();sql/sql_test.cc:456:void mysql_print_status()sql/mysqld.cc:69:#include “sql_test.h”     // mysql_print_statussql/mysqld.cc:2387:        mysql_print_status();   // Print some debug infoopenxs@ao756:~/git/percona-server$# from the line highlighted above…void mysql_print_status(){  char current_dir[FN_REFLEN];  STATUS_VAR current_global_status_var;  printf(“\nStatus information:\n\n”);  (void) my_getwd(current_dir, sizeof(current_dir),MYF(0));  printf(“Current dir: %s\n”, current_dir);  printf(“Running threads: %u  Stack size: %ld\n”,         Global_THD_manager::get_instance()->get_thd_count(),         (long) my_thread_stack_size);  thr_print_locks();                            // Write some debug info…  printf(“\nTable status:\n\…  display_table_locks();…# finally, let’s find the code of display_table_locks…openxs@ao756:~/git/percona-server$ grep -rni display_table_locks *sql/sql_test.cc:371:static void display_table_locks(void)sql/sql_test.cc:501:  display_table_locks();sql/mysqld.cc:9537:  { &key_memory_locked_thread_list, “display_table_locks”, PSI_FLAG_THREAD},openxs@ao756:~/git/percona-server$static void display_table_locks(void){  LIST *list;  Saved_locks_array saved_table_locks(key_memory_locked_thread_list);  saved_table_locks.reserve(table_cache_manager.cached_tables() + 20);  mysql_mutex_lock(&THR_LOCK_lock);  for (list= thr_lock_thread_list; list; list= list_rest(list))  {    THR_LOCK *lock=(THR_LOCK*) list->data;    mysql_mutex_lock(&lock->mutex);    push_locks_into_array(&saved_table_locks, lock->write.data, FALSE,                          “Locked – write”);    push_locks_into_array(&saved_table_locks, lock->write_wait.data, TRUE,                          “Waiting – write”);    push_locks_into_array(&saved_table_locks, lock->read.data, FALSE,                          “Locked – read”);    push_locks_into_array(&saved_table_locks, lock->read_wait.data, TRUE,                          “Waiting – read”);    mysql_mutex_unlock(&lock->mutex);  }…That’s enough to start. We’ll have to finally study what THR_LOCK is. There is a global (double) linked lists of all them, thr_lock_thread_list, and this is where we can get details from, in a way similar (let’s hope) to those server does while processing COM_DEBUG command for us. So, let’s attach gdb and let’s fun begin:openxs@ao756:~/git/percona-server$ sudo gdb -p 23841[sudo] password for openxs:GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1…Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.20x00007fafd93d9c5d in poll () at ../sysdeps/unix/syscall-template.S:8181      ../sysdeps/unix/syscall-template.S: No such file or directory.(gdb) set $list=(LIST *)thr_lock_thread_list(gdb) set $lock=(THR_LOCK*) $list->data(gdb) p *($lock)$1 = {list = {prev = 0x0, next = 0x7fafd2fa4780, data = 0x7fafbd545c80},  mutex = {m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0,        __nusers = 0, __kind = 3, __spins = 0, __elision = 0, __list = {          __prev = 0x0, __next = 0x0}},      __size = ’00’ <repeats 16 times>, “03”, ’00’ <repeats 22 times>,      __align = 0}, m_psi = 0x7fafd1f97c80}, read_wait = {data = 0x0,    last = 0x7fafbd545cc8}, read = {data = 0x0, last = 0x7fafbd545cd8},  write_wait = {data = 0x0, last = 0x7fafbd545ce8}, write = {    data = 0x7fafbd69f188, last = 0x7fafbd69f190}, write_lock_count = 0,  read_no_write_count = 0, get_status = 0xf84910 <mi_get_status>,  copy_status = 0xf84a70 <mi_copy_status>,  update_status = 0xf84970 <mi_update_status>,  restore_status = 0xf84a50 <mi_restore_status>,  check_status = 0xf84a80 <mi_check_status>}In the above we started from the first item in the list. If needed we can move on to the next with set $list=(LIST *)($list->next) etc. We then interpret $list->data as a (THD_LOCK*). There we can see read, read_wait, write and write_wait structures. One of them will have data item that is not 0x0. This way we can identify is it a lock or lock wait, and what kind of lock is it. In my case i was looking for a write lock, and here it is, the very first one. So, I continue:(gdb) p *($lock)->write.data$2 = {owner = 0x7fafbd468d38, next = 0x0, prev = 0x7fafbd545cf8,  lock = 0x7fafbd545c80, cond = 0x0, type = TL_WRITE,  status_param = 0x7fafbd69ee20, debug_print_param = 0x7fafbd515020,  m_psi = 0x7fafa4c06cc0}We can check the owner field:(gdb) p *($lock)->write.data.owner$3 = {thread_id = 16, suspend = 0x7fafbd469370}and thread_id there is what we were looking for, the MySQL thread id of the blocking thread. If we want to get the details about the table locks we can do this as follows, by studying debug_print_param in data:(gdb) set $table=(TABLE *) &(*($lock)->write.data->debug_print_param)(gdb) p $table$4 = (TABLE *) 0x7fafbd515020(gdb) p *($table)$5 = {s = 0x7fafbd4d7030, file = 0x7fafbd535230, next = 0x0, prev = 0x0,  cache_next = 0x0, cache_prev = 0x7fafc4df05a0, in_use = 0x7fafbd468000,  field = 0x7fafbd4d7440, hidden_field_count = 0, record = {    0x7fafbd4d7430 “\377”, 0x7fafbd4d7438 “\n”}, write_row_record = 0x0,  insert_values = 0x0, covering_keys = {map = 0}, quick_keys = {map = 0},  merge_keys = {map = 0}, possible_quick_keys = {map = 0},  keys_in_use_for_query = {map = 0}, keys_in_use_for_group_by = {map = 0},  keys_in_use_for_order_by = {map = 0}, key_info = 0x7fafbd4d7508,  next_number_field = 0x0, found_next_number_field = 0x0, vfield = 0x0,  hash_field = 0x0, fts_doc_id_field = 0x0, triggers = 0x0,  pos_in_table_list = 0x7fafbd47a9b8, pos_in_locked_tables = 0x7fafbd4e5030,  group = 0x0, alias = 0x7fafbbbad120 “misam”,  null_flags = 0x7fafbd4d7430 “\377”, bitmap_init_value = 0x0, def_read_set = {    bitmap = 0x7fafbd4d75a8, n_bits = 1, last_word_mask = 4294967294,    last_word_ptr = 0x7fafbd4d75a8, mutex = 0x0}, def_write_set = {    bitmap = 0x7fafbd4d75ac, n_bits = 1, last_word_mask = 4294967294,    last_word_ptr = 0x7fafbd4d75ac, mutex = 0x0}, tmp_set = {    bitmap = 0x7fafbd4d75b0, n_bits = 1, last_word_mask = 4294967294,    last_word_ptr = 0x7fafbd4d75b0, mutex = 0x0}, cond_set = {    bitmap = 0x7fafbd4d75b4, n_bits = 1, last_word_mask = 4294967294,    last_word_ptr = 0x7fafbd4d75b4, mutex = 0x0},  def_fields_set_during_insert = {bitmap = 0x7fafbd4d75b8, n_bits = 1,    last_word_mask = 4294967294, last_word_ptr = 0x7fafbd4d75b8, mutex = 0x0},—Type <return> to continue, or q <return> to quit—  read_set = 0x7fafbd515128, write_set = 0x7fafbd515148,  fields_set_during_insert = 0x7fafbd5151a8, query_id = 0, quick_rows = {    0 <repeats 64 times>}, const_key_parts = {0 <repeats 64 times>},  quick_key_parts = {0 <repeats 64 times>}, quick_n_ranges = {    0 <repeats 64 times>}, quick_condition_rows = 0, lock_position = 0,  lock_data_start = 0, lock_count = 1, temp_pool_slot = 0, db_stat = 39,  current_lock = 1, nullable = 0 ’00’, null_row = 0 ’00’,  status = 3 ’03’, copy_blobs = 0 ’00’, force_index = 0 ’00’,  force_index_order = 0 ’00’, force_index_group = 0 ’00’,  distinct = 0 ’00’, const_table = 0 ’00’, no_rows = 0 ’00’,  key_read = 0 ’00’, no_keyread = 0 ’00’, locked_by_logger = 0 ’00’,  no_replicate = 0 ’00’, locked_by_name = 0 ’00’,  fulltext_searched = 0 ’00’, no_cache = 0 ’00’,  open_by_handler = 0 ’00’, auto_increment_field_not_null = 0 ’00’,  insert_or_update = 0 ’00’, alias_name_used = 0 ’00’,  get_fields_in_item_tree = 0 ’00’, m_needs_reopen = 0 ’00’,  created = true, max_keys = 0, reginfo = {join_tab = 0x0, qep_tab = 0x0,    lock_type = TL_WRITE, not_exists_optimize = false,    impossible_range = false}, mem_root = {free = 0x7fafbd4d7420,    used = 0x7fafbd535220, pre_alloc = 0x0, min_malloc = 32, block_size = 992,    block_num = 6, first_block_usage = 0, max_capacity = 0,    allocated_size = 2248, error_for_capacity_exceeded = 0 ’00’,    error_handler = 0xc4b240 <sql_alloc_error_handler()>, m_psi_key = 106},—Type <return> to continue, or q <return> to quit—  blob_storage = 0x0, grant = {grant_table = 0x0, version = 0,    privilege = 536870911, m_internal = {m_schema_lookup_done = true,      m_schema_access = 0x0, m_table_lookup_done = true,      m_table_access = 0x0}}, sort = {filesort_buffer = {m_next_rec_ptr = 0x0,      m_rawmem = 0x0, m_record_pointers = 0x0, m_sort_keys = 0x0,      m_num_records = 0, m_record_length = 0, m_sort_length = 0,      m_size_in_bytes = 0, m_idx = 0}, io_cache = 0x0, merge_chunks = {      m_array = 0x0, m_size = 0}, addon_fields = 0x0,    sorted_result_in_fsbuf = false, sorted_result = 0x0,    sorted_result_end = 0x0, found_records = 0}, part_info = 0x0,  all_partitions_pruned_away = false, mdl_ticket = 0x7fafbd4c4f10,  m_cost_model = {m_cost_model_server = 0x0, m_se_cost_constants = 0x0,    m_table = 0x0}, should_binlog_drop_if_temp_flag = false}We already see table alias, but more details are in the s filed (of TABLE_SHARE * type):(gdb) p $table->s$26 = (TABLE_SHARE *) 0x7fafbd4d7030…(gdb) p $table->s->table_cache_key$27 = {str = 0x7fafbd4d73b8 “test”, length = 11}…(gdb) p $table->s->db$30 = {str = 0x7fafbd4d73b8 “test”, length = 4}(gdb) p $table->s->path$31 = {str = 0x7fafbd4d73c8 “./test/misam”, length = 12}(gdb) p $table->s->table_name$32 = {str = 0x7fafbd4d73bd “misam”, length = 5}# to remind you, this is how to get the thread id directly(gdb) p (*($lock)->write.data.owner).thread_id$37 = 16I skilled some of my tests and intermediate results. To summarize, for me it took only a bit more time to search the code and try commands in gdb than to find out that metadata_locks does not help and one has to use table_handles in performance_schema. Next time in gdb I’ll do it instantly, and the number of key strokes is far less :)Surely, for production environments I’ll have to write, test and note somewhere proper queries to the performance_schema. For next attempts with gdb and, maybe, FOSDEM talks I’ll have to read the code to remember key details about the data structures used above… More posts to come, stay tuned!

Fun with Bugs #53 – On Some Percona Software Bugs I’ve Reported

So far in this series I had written only/mostly about MySQL server bugs. Does it mean that there are no unique or interesting bugs in other MySQL-related software or MySQL forks and projects that use MySQL code?Surely no, it doesn’t. So, today I’d like to present a short list of currently active (that is, new or probably not yet resolved) bugs that I had reported for Percona’s software at Launchpad over the years I use it. The list is based on this link, but I excluded minor reports and those that probably are just forgotten, while the problem is fixed long time ago.Let me start with bugs in XtraBackup, the tool that plays a key role in most MySQL environments that are not using Oracle’s Enterprise subscriptions:

lp:1704636 – “xtrabackup 2.4.7 crashes while backing up MariaDB 10.2.x with ftwrl-* options”. I’ve reported it yesterday. Probably some unique background thread of MariaDB 10.2.x is not taken into account and some calculations are performed for it that lead to crash. Sorry, but there is no way to use –ftwrl-* options of XtraBackup with MariaDB 10.2.x.

lp:1418150 – “Provide a way for xtrabackup to communicate with storage managers via XBSA API”. This is a feature request created based on customer’s request to store backups in Tivoli Storage Manager. Similar request was recently noted from MariaDB customer as well. MySQL’s Enterprise Backup allows to work with storage managers, even if not in the most direct way. I’ve used TSM a lot in the past in production as Informix DBA, so I care about this feature more than about many others…
Let’s continue with another unique software package from Percona, Percona Toolkit. I still use some of the tools every day (and do not hesitate to write about this), and would like to see more attention to the toolkit in areas other than MongoDB. I’ve reported the following two bugs in the past that are still active:

lp:1566939 – “pt-stalk should not call mysqladmin debug by default”. Having in mind documentation Bug #71346 that Oracle just refuses to fix, I’d say there may be a dozen of people in the world who can make a good use of the output even when it’s correct, useful and really needed. For the rest of us it just pollutes error logs and pt-stalk data collections.

lp:1533271 – “pt-online-schema-change does not report anything when it waits for slave to replicate new table creation”. The problem is clear enough from description. There is nothing useful to find out the reason for the hang in this case without PTDEBUG=1.
Finally, let’s check several Percona Server bugs. Note that in many cases i had to file bug for Percona Server even if upstream MySQL was also affected, in a hope that Percona may fix the problem faster (and this happened sometimes). The following bugs were not inherited from MySQL directly:

lp:1455432 – “Audit log plugin should allow to include or exclude logging for specific users”. This was request for the same functionality that was originally provided in MariaDB’s implementation. Actually the bug was fixed in Percona Server 5.6.32-78.0 and 5.7.14-7, but remains in the list as it is still just “Triaged” for 5.5.x branch. Time to close it completely, I’d say.

lp:1408971 – “Manual does not describe Innodb_current_row_locks in any useful way”. Nothing changed, check the manual.

lp:1401528 – “Add option to switch to “old” get_lock() behavior”. It’s hard to introduce new features in minor release during GA period, and do it right. Somebody had to think about backward compatibility, upgrades and downgrades, and about the completeness of implementation (see lp:1396336 about that). In this specific case Percona failed to do it right.

lp:1396035 – “Percona-Server-test-56 RPM package does NOT check /usr/share/percona-server directory and thus is unusable “as is”. Packaging is another thing that is hard to do right. I was not lazy today, so I re-check this on CentOS 6.x. with 5.7.18, and ./mtr is somewhat useful there:[root@centos mysql-test]# ls -l ./mtrlrwxrwxrwx. 1 root root 19 Jun  7 18:20 ./mtr -> ./mysql-test-run.pl[root@centos mysql-test]# perl ./mtr analyzeLogging: ./mtr  analyzeMySQL Version 5.7.18Checking supported features… – SSL connections supportedCollecting tests…Checking leftover processes…Removing old var directory…Creating var directory ‘/usr/share/mysql-test/var’…Installing system database…Using parallel: 1==============================================================================TEST                                      RESULT   TIME (ms) or COMMENT————————————————————————–worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009worker[1] mysql-test-run: WARNING: running this script as _root_ will cause some tests to be skippedmain.analyze                             [ pass ]  10920————————————————————————–The servers were restarted 0 timesSpent 10.920 of 58 seconds executing testcasesCompleted: All 1 tests were successful.

lp:1263968 – “DOCS: http://www.percona.com/doc/percona-server/5.6/installation.html does not provide enough details on further steps after installing .deb”. I’d still say that some details requested are missing. Writing proper manuals is hard.
I had surely reported a lot more bugs, but many of them have related upstream MySQL ones, so there are no reasons to discuss them separately now. Also, note that in Percona I was involved in setting priorities for the bugs and mostly reported them based on customer issues or complains. So, no wonder that my bug reports were fixed fast, and not so many active ones left by now.There was yet another Percona tool that I spent as lot of time on, both on building it from current sources, using it and  reporting problems with it. It’s Percona Playback, Based on this, they seem to work on other but similar tool now. Still, I consider this request useful for any tool that allows to replay specific load: lp:1365538.

Fun with Bugs #51 – My Bug Reports that Oracle doesn’t Want to Fix

This week I noticed (yet another) customer issue related to the output produced by mysqladmin debug command (or when mysqld process gets SIGHUP signal). I mean the output generated by the mysql_print_status() function. In this issue the content of the output was misinterpreted. I’ve seen this in the past more than once, and requested to document the output properly, but it never happened for a reason that there is an internal feature request to put this information elsewhere, in Performance Schema or Information Schema. The bug ended up with “Won’t fix” status.Surely I complained in a comment and on Facebook, and then decided to check if there are any other my bug reports and documentation request that Oracle explicitly decided not to fix after accepting the fact that there is a problem.I’ve ended up with the following short list:

Bug #69399 – “Inconsistency in crash report”. Here I’ve got a perfect reason to keep things as they are currently implemented. Functions called from signal handlers must be async signal safe, and time() is like that, but it always outputs in UTC. It would be great to print time in UTC in some messages as well, so that timezone difference is 100% clear, but it’s truly not a big deal.

Bug #71300 – “Manual does not explain that statement/abstract/* instruments appeared in 5.6.15”. This change in naming that happened in 5.6.15 is, indeed, explained in the manual, even if there is no highlighted statements about incompatible change etc. I can live with that.

Bug #71303 – “Manual page on P_S build configuration does not provide enough details”. I really missed the details at that time on how to instrument individual buffer’s mutexes/rwlocks, after getting a hint during my talk that I had no chance to see real most important waits in Bug #68079  with Performance Schema without recompiling it properly. I’ve got a useful comment at the end of the bug report, but I truly do not understand why this detail (“The only way to enable it is by removing the line which defines PFS_SKIP_BUFFER_MUTEX_RWLOCK in storage/innobase/include/sync0sync.h. Seems to be no compiler flags to enable or disable the above mentioned symbol.”) was not added to the small enough page as a note.

Bug #71304 – “Manual does not provide enough details about automatic sizing of P_S parameters “. Here my suggestions were refused. Go figure yourself, check the output of mysqld –verbose –help 2>/dev/null | grep performance | grep “\-1” to find out what parameters are auto-sized and go read the code to find out how exactly, if you care. They don’t.

Bug #71274 – “Manual does not provide enough details about background threads in P_S.threads”. All I’ve got in reply is: “The purpose of the page is to describe the table structure, not enumerate the (subject to change) set of background threads.” You may be satisfied with this remark, but I am not.

Bug #71346 – “Manual does not provide details on mysqladmin debug output”. As you can check here, even for MySQL 8 the command is still there, but all we have about the output is: “Tell the server to write debug information to the error log. Format and content of this information is subject to change. This includes information about the Event Scheduler.”

Bug #75366 – “mysql_install_db requires explicit –lc-messages-dir if non-default PREFIX used”. This was reported at early MySQL 5.7.x development stage, and I’ve got a recommendation to use mysqld –initialize instead. I do so now, but sometimes some related problem still happen, see Bug #84173 and Bug #80351. I think that even deprecated commands must be properly documented, including any incompatible changes in behavior, options, binaries location etc, until they still exist in GA and supported versions of MySQL.

Bug #78822 – “Materialized table from semijoin may change join order and lead to bad plan”. In MySQL 5.7 the problem must be fixed, and for 5.6 the following obvious workaround was suggested: set optimizer_switch=”semijoin=off”;

Bug #80601 – “Manual is wrong/not clear while explaining kill flag check for ALTER TABLE”.  Even though it was stated that “this is an implementation detail subject to change”, some clarifications happened in the manual.
To summarize, out of 310 bug reports I’ve created since 2005, Oracle decided not to fix just 9, and in many cases provided proper explanations about the reasons to do this, or made some changes in the manual. The remaining cases all are related to MySQL manual and mostly happened in 2014, when nice people found a way to shut me up (temporary) on the topic of MySQL bugs…

Fun with Bugs #46 – On Some Bugs I’ve Reported During the Year of 2016

It’s time to summarize the year of 2016. As a kind of a weird summary, in this post I’d like to share a list of MySQL bug reports I’ve created in 2016 that are still remaining “Verified” today:Bug #79831 – “Unexpected error message on crash-safe slave with max_relay_log_size set”. According to Umesh this is not repeatable with 5.7. The fact that I’ve reported the bug on January 4 probably means I was working at that time. I should not repeat this mistake again next year.Bug #80067 – “Index on BIT column is NOT used when column name only is used in WHERE clause”. People say the same problem happens with INT and, what may be even less expected, BOOLEAN columns.Bug #80424 – “EXPLAIN output depends on binlog_format setting”. Who could expect that?Bug #80619 – “Allow slave to filter replication events based on GTID”. In this feature request I’ve suggested to implement filtering by GTID pattern, so that we can skip all events originating from specific master on some slave in a complex replication chain.Bug #82127 – “Deadlock with 3 concurrent DELETEs by UNIQUE key”. It’s clear that manual is not even close to explaining how the locks are really set “by design” in this weird case. See comments in MDEV-10962 for some explanations. Nobody from Oracle event tried to really explain how things are designed to work.Bug #82212 – “mysqlbinlog can produce events larger than max_allowed_packet for mysql”. This happens for encoded row-based events. There should be some way to take this overhead into account while creating binary log, IMHO.Bug #83024 – “Internals manual does not explain COM_SLEEP in details”. One day you’ll see Sleep for some 17 seconds logged into the slow query log, and may start to wonder why…Bug #83248 – “Partition pruning is not working with LEFT JOIN”. You may find some interesting related ideas in MDEV-10946.Bug #83640 – “Locks set by DELETE statement on already deleted record”. This case shows that design of locking in InnoDB does produce really weird outcomes sometimes. This is not about “missing manual”, this is about extra lock set that is absolutely NOT needed (a gap X lock on a record in the secondary unique index is set when the same transaction transaction already has the next key lock on it). As a side note, I keep finding, explaining and reporting weird or undocumented details in InnoDB locking for years, still my talk about InnoDB locks was not accepted by Oracle once again for OOW in 2016. What do I know about the subject and who even cares about those locks… Bug #83708 – “uint expression is used for the value that is passed as my_off_t for DDL log”. I was really shocked by this finding. I assumed that all uint vs unsigned long long improper casts are already found. It seems I was mistaking.Bug #83912 – “Time spent sleeping before entering InnoDB is not measured/reported separately”. The use case that led me to reporting this bug is way more interesting than the fact that some wait is not instrumented in performance_schema. You may see more related bug reports from me next year.Bug #83950 – “LOAD DATA INFILE fails with an escape character followed by a multi-byte one”. This single bug (and related bugs and stories) were original topic for issue #46 of my “Fun With Bugs” series. I was not able to write everything I want properly over last 3 weeks, but trust me: it’s a great story, of “Let’s Make America Great Again” style. With the goal for LOAD DATA to behave exactly as INSERT when wrong utf8 data are inserted, Oracle changed the way LOAD DATA works back and forth, with the last change (back) happened in 5.7.17: “Incompatible Change: A change made in MySQL 5.7.8 for handling of multibyte character sets by LOAD DATA was reverted due to the replication incompatibility (Bug #24487120, Bug #82641)”I just can not keep up with all the related fun people have in replication environments thanks to these ongoing changes… It’s incredible.Bug #84004 – “Manual misses details on MDL locks set and released for online ALTER TABLE”. Nothing new: locks in MySQL are not properly/completely documented, metadata locks included. yes, they are documented better now, after 11+ years of my continuous efforts (of a kind), but we are “not there yet”. I am still waiting for a job offer to join MySQL Documentation Team, by the way :)Bug #84173 – “mysqld_safe –no-defaults & silently does NOT work any more”. Recent MySQL 5.7.17 release had not only given us new Group Replication plugin and introduced incompatible changes. In a hope to fix security issues it comes with pure regression – for the first time in last 11 years mysqld_safe –no-defaults stopped working for me! By the way, mysqld_safe is still NOT safe in a sense that 5.7.17 tried to enforce, and one day (really soon) you will find out why.Bug #84185 – “Not all “Statements writing to a table with an auto-increment…” are unsafe”. If you do something like DELETE FROM `table` WHERE some_col IN (SELECT some_id FROM `other_table`) where `table` has auto_increment column, why should anyone care about it? We do not generate the value, we delete rows… This bug report was actually created by Hartmut Holzgraefe and test case comes from Elena Stepanova (see MDEV-10170). I want to take this opportunity to thank them and other colleagues from MariaDB for their hard work and cooperation during the year of 2016. Thanks to Umesh (who processed most of my bug reports),  Sinisa Milivojevic and Miguel Solorzano for their verifications of my bug reports this year.In conclusion I should say that, no matter how pointless you may consider this activity, I still suggest you to report each and every problem that you have with MySQL and can not understand after reading the manual, as a public MySQL bug. Now, re-read my 4 years old post on this topic and have a Happy and Fruitful New Year 2017!

MySQL Support Engineer’s Chronicles, Issue #4

This week I had to deal with some unusual problems. But let me start with Percona’s xtrabackup, software that I consider a key component of many current production MySQL setups and use regularly. Recently new minor versions of XtraBackup were released, check the details on 2.4.5, for example. It made a step towards support of MariaDB 10.2, but it’s still a long way to go, see this pull request #200.My main problem with xtrabackup, though, is not with lack of support of MariaDB 10,2-specific features. Why should they care, after all… The problem is that old well known bugs and problems are not resolved, those that may affect all MySQL versions, forks and environments. Check lp:1272329 , “innobackupex dies if a directory is not readable. i.e.: lost+found”, for example. Why not to read and take into account ignore_db_dir option (as server does) and let those poor souls who used mount point as a datadir to make backups? Check even older problem, passwords that are not hidden in the command lines, see lp:907280, “innobackupex script shows the password in the ps output, when its passed as a command line argument”. My colleague Hartmut even suggested the fix recently, see pull request #289.Because of these old, known problems (some of them being low hanging fruits) that are not fixed users still suffer while using xtrabackup way more often than they would like to. One day, as a result, they’ll have to switch to some other online backup tools or approaches. One may dream about extended myrocks_hotbackup to cover InnoDB one day (when MyRocks and InnoDB will work together in one instance), or just use Percona TokuBackup (after adding script to go SST for Galera with it, maybe), or try something totally different. Anyway, I feel that if more bugs (including low hanging fruits) in xtrabackup are not getting fixed and pull requests are not actively accepted, the tool may become much less relevant and used soon.I had to deal with MaxScale-related issues this week, so I’d like to remind those who use Docker for testing about https://github.com/asosso/maxscale-docker/blob/master/Dockerfile. Personally I prefer to build from source. In any case, I’d like us all to remember that in older versions one may have to set strip_db_esc option explicitly for service to deal with database names containing underscore (_). Recent 2.0.x versions have it enabled by default (see MXS-801).I also had to explain how online ALTER TABLE works, specifically, when it sets exclusive metadata locks in the process. I still do not see this topic properly explained in the manual, so I had to report Bug #84004, “Manual misses details on MDL locks set and released for online ALTER TABLE”.By no means I am a developer for 11 years already, even less one should expect writing Java code from me. Anyway, I had to explain how to replace Oracle’s ref_cursors (a.k.a cursor variables) in MySQL, both in stored procedures and in Java code that calls them. If you are wondering what is this about, check this fine manual. Note that this feature is missing in MySQL, even though it was suggested to implement it here. In general, MySQL allows just to run SELECTs in stored procedures and then in Java you can process each of the result sets returned any way you want. Things may get more complicated when more than one result set is produced, and they are even more complicated in Oracle with nested cursor expressions. So, I plan to devote a separate blog post to this topic one day. Stay tuned.I answer questions coming not only from customers. Old friends, community users out of nowhere and, even more, colleagues are welcomed to discuss whatever MySQL- or MariaDB-related  problem they may have. If I know how to help, I’ll do this, otherwise I’ll quickly explain that I am of no good use. This is how I ended up testing MariaDB’s CONNECT storage engine quickly as a replacement for the Federated engine, that is, to link table to a remote MySQL table. Basic instructions on how to set it up and use MySQL type looked simple, but when I tried to test on Fedora 23 and hit a problem of missing libodbc.so.1:MariaDB [(none)]> INSTALL SONAME ‘ha_connect’;ERROR 1126 (HY000): Can’t open shared library ‘/home/openxs/dbs/mariadb10.1/lib/plugin/ha_connect.so’  (errno: 2, libodbc.so.1: cannot open shared object file: No such file or directory)the solution was not really straightforward. First of all I had to install unixODBC.x86_64 2.3.4-1.fc23 RPM, but it also does not provide libodbc.so.1:[openxs@fc23 node2]$ find / -name libodbc.* 2>/dev/null/usr/lib64/libodbc.so.2/usr/lib64/libodbc.so/usr/lib64/libodbc.so.2.0.0So, I had to apply a quick and dirty hack:[openxs@fc23 node2]$ sudo ln -s /usr/lib64/libodbc.so.2.0.0  /usr/lib64/libodbc.so.1As a result CONNECT engine worked as expected, as long as proper account and IP-address where used: MariaDB [test]> INSTALL SONAME ‘ha_connect’;Query OK, 0 rows affected (0.27 sec)MariaDB [test]> create table r_t2(id int primary key, c1 int) engine=connect table_type=mysql connection=’mysql://msandbox:msandbox@127.0.0.1:23532/test/t’;Query OK, 0 rows affected (0.04 sec)MariaDB [test]> select * from r_t2;                     +—-+——+| id | c1   |+—-+——+|  1 |    2 ||  2 |    3 |+—-+——+2 rows in set (0.00 sec)From configuring MaxScale to work with database having underscore in the name to re-writing Java code that used to work with Oracle RDBMS for MySQL, with many missing details in the manuals and software bugs identified or reported in between, and all that with ongoing studies of performance problems and lost quorums, rolling upgrades and failed SSTs in Galera clusters – this is what support engineers here in MariaDB have to deal with during a typical working week.

MySQL Support Engineer’s Chronicles, Issue #3

The original idea of this series was to publish one post per week, but it seems every other week I have some special topic that well deserves a dedicated post. Last week I had no time to complete my writing because of long (and, I hope, useful) Howto post on replacing corrupted partition using non-corrupted one from other server in replication setup. But I had links and notes collected in a draft that I am going to complete now.First of all, during the previous week I had time to submit two more talks for the  “MySQL and Friends Devroom” at FOSDEM 2017, “Applying profilers to MySQL” and “Background Story of a MySQL Bug”. Call for papers is still open, as far as I understand, so I may come up with few more ideas on what to talk about.Strange to admit this, but sometimes I can be excited with something MySQL-related. Two weeks ago I’ve added a note to myself about great, detailed changelogs that MariaDB publishes, with proper links to GitHub commits. Check this one for 10.0.28, for example. I wish Oracle provides the same level of details for MySQL releases as a part of their official release notes.Still, sometimes important changes happen in upstream MySQL, get merged, and details about inherited incompatible change (and its real impact) are still missing in any release notes of any kind. Let’s consider recent example. Historically MySQL treated incorrect utf8 bytes sequences differently for INSERT than for LOAD DATA. INSERT failed, LOAD DATA just had the value truncated on the first incorrect character and continued. Eventually (in MySQL 5.6.32) this was fixed by upstream MySQL (it was also fixed in a wider context in MariaDB 10.2 in frames of MDEV-9823). MySQL 5.6.32 release notes says about the incompatible change:”Incompatible Change: For multibyte character sets, LOAD DATA could fail to allocate space correctly and ignore input rows as a result. (Bug #76237, Bug #20683959, Bug #23080148) References: This issue is a regression of: Bug #14653594.”But it surely says nothing about the impact for replication or another Bug #78758 that is fixed in 5.6.32 (by making LOAD DATA fail with error). It costed me some time to figure out all the details. Read MDEV-11217 for the historical details, nice test case and different views on the decisions made. Note also that now error message about the bad utf8 character from LOAD DATA looks (IMHO) weird enough, as it contains actually only a valid part of the string. See my MDEV-11216 about this.I am still having fun with InnoDB locks. This week I checked what locks are set when the same DELETE (for the table with the primary key and unique secondary index) is executed twice in the same transaction. Check Bug #83640 and tell me how this sequence of locks set in one transaction may make any sense:—TRANSACTION 636201, ACTIVE 202 sec5 lock struct(s), heap size 1184, 4 row lock(s), undo log entries 1MySQL thread id 1, OS thread handle 0x7f9e513a7700, query id 92 localhost root initshow engine innodb statusTABLE LOCK table `test`.`tu` trx id 636201 lock mode IXRECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gapRECORD LOCKS space id 11 page no 3 n bits 72 index `PRIMARY` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gapRECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode XRECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks gap before recMy colleague Jan Lindström was also surprised, so we have MDEV-11215 as well, and a chance to see this studies and maybe changed by MariaDB engineers. Related problems were discussed in the past, see Bug #19762 and Bug #55717.Some days I keep wondering what happens to XtraBackup these days in Percona. As far as I remember I was not able to reproduce lp:1461833 while working there, but the fact that the bug is still open and got no comments since I re-opened it gives me no additional confidence. I report bugs and missing details in MySQL way too often even for my own liking. But MySQL manual really misses many details to explain results that users see in production. This week I’d like to remind about one of my bug reports about missing details in MySQL documentation, Bug #77390, and my request there:”Please, explain all metadata and InnoDB locks set by online ALTER, with examples and details enough to explain non-trivial cases…” Honestly, until this month I never noted that DDL log exists in MySQL. This is a ddl_log.log binary file that can be “dumped” into a somewhat readable form using a script by Mattias Jonsson from Bug #47343:[openxs@fc23 5.7]$ perl ~/ddl_log_dump.pl data/ddl_log.logHeader: Num entries: 4 Name length: 512 Block length 4096Entry 1 type i action s next 0  name ./test/trange2not ‘e’ entry (i)Entry 2 type l action d next 0  name ./test/#sql-trange2not ‘e’ entry (l)Entry 3 type l action d next 2  name ./test/trange2#P#pmaxnot ‘e’ entry (l) This file may grow until MySQL server restart completes, but what’s worse, when it grows over 4GB in size it becomes unusable and effectively blocks any further concurrent DDL until we get rid of it. I had a lot of fun reading the code and reporting related Bug #83708. Unlucky users who do a lot of partitioning-related DDL may find the situation less funny when they hit this bug.I plan to describe what I had to work on this week soon, while I still remember all the relevant details and feelings. So, stay tuned!

Using gdb to understand what locks (and when) are really set by InnoDB. Part II.

In the previous post we checked lock requests while executing INSERT INTO t … SELECT FROM tt where there is an auto_increment column for which the value is generated in the destination table with default innodb_autoinc_lock_mode=1. Based on it I’ve reported Bug #76563 that is already verified.Let’s continue to study a special case of the statement mentioned above, INSERT INTO t … SELECT FROM t, that is, when source and destination table is the same. We again start with table t having just 4 rows:mysql> rollback;Query OK, 0 rows affected (0.01 sec) mysql> select * from t;+—-+——+| id | val  |+—-+——+|  1 |    1 ||  2 |    2 ||  3 |    3 ||  4 |    4 |+—-+——+4 rows in set (0.00 sec)mysql> start transaction;Query OK, 0 rows affected (0.00 sec)mysql> insert into t(val) select 100 from t;Tracing in gdb with the same breakpoints set shows the following:[Switching to Thread 0x7fd74cf79700 (LWP 1866)]Breakpoint 1, lock_table (flags=0, table=0x7fd7233f69e8, mode=LOCK_IS, thr=0x7fd7233de6f0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:44264426            if (flags & BTR_NO_LOCKING_FLAG) {(gdb) p table->name$41 = 0x7fd7383feac0 “test/t”So, we start with IS lock on the table. Probably we plan to read some rows. Let’s continue:(gdb) cContinuing.Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249af000, heap_no=2, index=0x7fd7233f7b68, thr=0x7fd7233de6f0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:23292329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {(gdb) p index->table_name$42 = 0x7fd7383feac0 “test/t”(gdb) cContinuing.Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249af000, heap_no=3, index=0x7fd7233f7b68, thr=0x7fd7233de6f0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:23292329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {(gdb) cContinuing.Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249af000, heap_no=4, index=0x7fd7233f7b68, thr=0x7fd7233de6f0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:23292329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {(gdb) cContinuing.Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249af000, heap_no=5, index=0x7fd7233f7b68, thr=0x7fd7233de6f0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:23292329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {(gdb) cContinuing.Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249af000, heap_no=1, index=0x7fd7233f7b68, thr=0x7fd7233de6f0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:23292329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {We set S (mode=2) locks on each row in the “source” (t) table and supremum record there, to begin with. We read all rows that we plan to insert into the temporary table before inserting them. One could assume this just from checking the plan:mysql> explain insert into t(val) select 100 from t\G*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: t         type: indexpossible_keys: NULL          key: PRIMARY      key_len: 4          ref: NULL         rows: 4        Extra: Using index; Using temporary1 row in set (0.00 sec) Note that this stage, selecting all rows into the temporary table, may take a long time for a big table t. During all this time there is NO AUTO-INC lock set. Probably it explains what happened in cases discussed in the original Bug #72748.(gdb) cContinuing.Breakpoint 3, row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:11461146            if (trx == table->autoinc_trx) {(gdb) cContinuing.Breakpoint 1, lock_table (flags=0, table=0x7fd7233f69e8, mode=LOCK_AUTO_INC, thr=0x7fd7233e3780) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:44264426            if (flags & BTR_NO_LOCKING_FLAG) {(gdb) p table->name$43 = 0x7fd7383feac0 “test/t”Only now we set AUTO-INC lock on the “destination” table (again t in this case) and proceed with expected insertion:(gdb) cContinuing.Breakpoint 1, lock_table (flags=0, table=0x7fd7233f69e8, mode=LOCK_IX, thr=0x7fd7233e3780) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:44264426            if (flags & BTR_NO_LOCKING_FLAG) {(gdb) p table->name$44 = 0x7fd7383feac0 “test/t”(gdb) cContinuing.Breakpoint 3, row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:11461146            if (trx == table->autoinc_trx) {(gdb) cContinuing.Breakpoint 3, row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:11461146            if (trx == table->autoinc_trx) {(gdb) cContinuing….Our UPDATE completes finally:…Query OK, 4 rows affected, 1 warning (5 min 45.02 sec)Records: 4  Duplicates: 0  Warnings: 1mysql> show warnings\G*************************** 1. row ***************************  Level: Note   Code: 1592Message: Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ on master and the slave.1 row in set (0.00 sec)mysql> select * from t;+—-+——+| id | val  |+—-+——+|  1 |    1 ||  2 |    2 ||  3 |    3 ||  4 |    4 || 12 |  100 || 13 |  100 || 14 |  100 || 15 |  100 |+—-+——+8 rows in set (0.00 sec)To summarize, the following locks are set (and in the following order) while executing INSERT INTO t … SELECT FROM t where there is an auto_increment column for which the value is generated (with default innodb_autoinc_lock_mode=1 and using default REPEATABLE READ transaction isolation level) according to our tracing in gdb:IS lock on the table (t)S locks on each row in t we read with SELECTS lock on the supremum record in t AUTO-INC lock on tIX lock on tProbably implicit X lock on each row inserted (we had not seen this in gdb clearly). We read these rows from the temporary table we’ve created I see two potential problems here.There is a real chance to run (and commit) fast concurrent INSERT (that uses AUTO-INC lock) into the table t while we read all rows from it before we request AUTO-INC lock on the destination. As a result, some of rows inserted may be read during SELECT stage and lead to more rows inserted than we could expect.Also, there is no explicit explanation for this (corner) case of INSERT … SELECT in the manual. I plan to file a documentation bug later.

Using gdb to understand what locks (and when) are really set by InnoDB. Part I.

There is a story behind this post. To make it short, the detailed study presented here started when I noted old Bug #72748 in “No feedback” status few days ago. I considered this status totally wrong and started a discussion on Facebook about it (yes, this is what I usually do, I discuss handling of MySQL bugs in public).The technical topic of this discussion (blaming aside) was the way AUTO-INC locks are set by InnoDB with default innodb_autoinc_lock_mode=1. Manual defines “bulk inserts” concept: “Bulk inserts” Statements for which the number of rows to be inserted (and the number of required auto-increment values) is not known in advance. This includes INSERT … SELECT, REPLACE … SELECT, and LOAD DATA statements, but not plain INSERT. InnoDB will assign new values for the AUTO_INCREMENT column one at a time as each row is processed.  and then says: innodb_autoinc_lock_mode = 1 (“consecutive” lock mode)This is the default lock mode. In this mode, “bulk inserts” use the special AUTO-INC table-level lock and hold it until the end of the statement. This applies to all INSERT … SELECT, REPLACE … SELECT, and LOAD DATA statements. Only one statement holding the AUTO-INC lock can execute at a time. I suspected (based on test case presented by my colleague Peiran last year) that this may not be case in reality, or at least the process is a bit more complicated. In the process of technical discussion, to make my points clear and highlight the underlying problem, I’ve reported another bug, Bug #76533, with simple test case and some speculations (that were not entirely correct as I soon found out).After re-reading my report and re-checking results I was really surprised to see the result that can not be explained by the manual page mentioned above (yes, after all my bug reports for the manual I still consider it correct until the opposite is clearly proved, but that can be reproduced on MySQL 5.5, 5.6 and 5.7.Once when I was in doubts I’ve got a nice advice from my dear friend Sinisa, “Read the Source, Luke!”. But one has probably to spend few days at least to understand enough details to say for sure what happens when one executes INSERT INTO T … SELECT FROM T (same table) comparing to INSERT INTO T … SELECT FROM S (other table). So, I gave good advice myself (that I actually used to give others at FOSDEM 2015): “Use gdb, Luke!”. It was even easier to follow that way after reading this great post by Annamalai.So, I set up simple test, like this:[openxs@centos ~]$ mysql -uroot testReading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -AWelcome to the MySQL monitor.  Commands end with ; or \g.Your MySQL connection id is 1Server version: 5.6.23-72.1-log Percona Server (GPL), Release 72.1, Revision 0503478Copyright (c) 2009-2015 Percona LLC and/or its affiliatesCopyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or itsaffiliates. Other names may be trademarks of their respectiveowners.Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.mysql> select @@innodb_autoinc_lock_mode, @@log_bin, @@binlog_format;+—————————-+———–+—————–+| @@innodb_autoinc_lock_mode | @@log_bin | @@binlog_format |+—————————-+———–+—————–+|                          1 |         1 | STATEMENT       |+—————————-+———–+—————–+1 row in set (0.01 sec)mysql> select @@innodb_locks_unsafe_for_binlog;+———————————-+| @@innodb_locks_unsafe_for_binlog |+———————————-+|                                0 |+———————————-+1 row in set (0.00 sec) mysql> create table t(id int auto_increment primary key, val int);Query OK, 0 rows affected (0.21 sec)mysql> create table tt(id int auto_increment primary key, val int);Query OK, 0 rows affected (0.14 sec)mysql> insert into t(val) values (1), (2), (3), (4);Query OK, 4 rows affected (0.02 sec)Records: 4  Duplicates: 0  Warnings: 0mysql> insert into tt(val) values (1), (2), (3), (4);Query OK, 4 rows affected (0.02 sec)Records: 4  Duplicates: 0  Warnings: 0mysql> select * from t;+—-+——+| id | val  |+—-+——+|  1 |    1 ||  2 |    2 ||  3 |    3 ||  4 |    4 |+—-+——+4 rows in set (0.00 sec)mysql> select * from tt;+—-+——+| id | val  |+—-+——+|  1 |    1 ||  2 |    2 ||  3 |    3 ||  4 |    4 |+—-+——+4 rows in set (0.01 sec)That is, on Percona Server 5.6.23 I’ve created two InnoDB tables with auto_increment PRIMARY key and one other integer column. I’ve inserted the same 4 rows into both tables. Now, I wonder what locks are set while executing each the following statements, in the same single session, and in what order they are set:insert into t(val) select 100 from tt;insert into t(val) select 100 from t;as I suspect there is a difference that is not clearly documented. I decided to use gdb to set breakpoints on the following functions: lock_table() lock_rec_lock()row_lock_table_autoinc_for_mysql()Don’t ask me why these, it’s topic for a separate discussion (but you can guess based on the names and/or code review and/or this great post I’ve already mentioned).I’ve made sure there is debug info for the official RPM binaries I’ve used:[root@centos ~]# rpm -qa | grep -i percona… Percona-Server-server-56-5.6.23-rel72.1.el6.x86_64Percona-Server-56-debuginfo-5.6.23-rel72.1.el6.x86_64…So, I’ve attached gdb to mysql process running and set breakpoints:[root@centos ~]# gdb -p `pidof mysqld`GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)…  Loaded symbols for /usr/lib64/mysql/plugin/ha_tokudb.so0x00007fd74b0970d3 in poll () from /lib64/libc.so.6Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.5.x86_64 jemalloc-3.6.0-1.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-33.el6.x86_64 libaio-0.3.107-10.el6.x86_64 libcom_err-1.41.12-21.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 libstdc++-4.4.7-11.el6.x86_64 nss-softokn-freebl-3.14.3-22.el6_6.x86_64 openssl-1.0.1e-30.el6_6.7.x86_64 zlib-1.2.3-29.el6.x86_64(gdb) set pagination 0(gdb) b lock_tableBreakpoint 1 at 0x91c940: file /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc, line 4426.(gdb) b lock_rec_lockBreakpoint 2 at 0x91de04: file /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc, line 2329.(gdb) b row_lock_table_autoinc_for_mysqlBreakpoint 3 at 0x984504: file /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc, line 1146.(gdb) cContinuing.Then in the session where I created my tables I’ve started transaction to check the first statement:mysql> start transaction;Query OK, 0 rows affected (0.00 sec)mysql> insert into t(val) select 100 from tt;and immediately hit one of breakpoints in gdb:[Switching to Thread 0x7fd74cf79700 (LWP 1866)]Breakpoint 1, lock_table (flags=0, table=0x7fd7233f7de8, mode=LOCK_IS, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:44264426            if (flags & BTR_NO_LOCKING_FLAG) {(gdb) p table->name$1 = 0x7fd7383feae8 “test/tt”So, we had a request for IS lock for the tt table.  I had source code at hand to check for some details that may be useful while interpreting data:– this is from  storage/innobase/include/lock0types.h /* Basic lock modes */enum lock_mode {        LOCK_IS = 0,    /* intention shared */        LOCK_IX,        /* intention exclusive */        LOCK_S,         /* shared */        LOCK_X,         /* exclusive */        LOCK_AUTO_INC,  /* locks the auto-inc counter of a table                        in an exclusive mode */        LOCK_NONE,      /* this is used elsewhere to note consistent read */        LOCK_NUM = LOCK_NONE, /* number of lock modes */        LOCK_NONE_UNSET = 255};– this is from storage/innobase/include/lock0lock.h#define LOCK_WAIT       256     /*!< Waiting lock flag; when set, it                                means that the lock has not yet been                                granted, it is just waiting for its                                turn in the wait queue *//* Precise modes */#define LOCK_ORDINARY   0       /*!< this flag denotes an ordinary                                next-key lock in contrast to LOCK_GAP                                or LOCK_REC_NOT_GAP */#define LOCK_GAP        512     /*!< when this bit is set, it means that the                                lock holds only on the gap before the record;                                for instance, an x-lock on the gap does not                                give permission to modify the record on which                                the bit is set; locks of this type are created                                when records are removed from the index chain                                of records */#define LOCK_REC_NOT_GAP 1024   /*!< this bit means that the lock is only on                                the index record and does NOT block inserts                                to the gap before the index record; this is                                used in the case when we retrieve a record                                with a unique key, and is also used in                                locking plain SELECTs (not part of UPDATE                                or DELETE) when the user has set the READ                                COMMITTED isolation level */#define LOCK_INSERT_INTENTION 2048 /*!< this bit is set when we place a waiting                                gap type record lock request in order to let                                an insert of an index record to wait until                                there are no conflicting locks by other                                transactions on the gap; note that this flag                                remains set when the waiting lock is granted,                                or if the lock is inherited to a neighboring                                record */if gdb would not be kind enough to do this for me (it ‘s, for enum, as you can see).Now, let’s check how we ended up here:(gdb) bt#0  lock_table (flags=0, table=0x7fd7233f7de8, mode=LOCK_IS, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:4426#1  0x0000000000999bdd in row_search_for_mysql (buf=0x7fd71bfed410 “\37504”, mode=1, prebuilt=0x7fd7233e2068, match_mode=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0sel.cc:4083#2  0x00000000008e8b41 in ha_innobase::index_read (this=0x7fd71bf9d610, buf=0x7fd71bfed410 “\37504”, key_ptr=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:8232#3  0x00000000008dc090 in ha_innobase::index_first (this=0x7fd71bf9d610, buf=0x7fd71bfed410 “\37504”) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:8629#4  0x000000000059a3ee in handler::ha_index_first (this=0x7fd71bf9d610, buf=0x7fd71bfed410 “\37504”) at /usr/src/debug/percona-server-5.6.23-72.1/sql/handler.cc:3057#5  0x00000000006b491d in join_read_first (tab=0x7fd72332dc60) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_executor.cc:2540#6  0x00000000006b3c0d in sub_select (join=0x7fd72332cf40, join_tab=0x7fd72332dc60, end_of_records=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_executor.cc:1256#7  0x00000000006b2ea8 in do_select (this=0x7fd72332cf40) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_executor.cc:933#8  JOIN::exec (this=0x7fd72332cf40) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_executor.cc:194#9  0x00000000006fc3c5 in mysql_execute_select (thd=0x7fd71c354000, tables=0x7fd72332c888, wild_num=0, fields=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_select.cc:1100#10 mysql_select (thd=0x7fd71c354000, tables=0x7fd72332c888, wild_num=0, fields=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_select.cc:1221#11 0x00000000006fcc25 in handle_select (thd=0x7fd71c354000, result=0x7fd72332ce00, setup_tables_done_option=1073741824) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_select.cc:110#12 0x00000000006d977b in mysql_execute_command (thd=0x7fd71c354000) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_parse.cc:3965#13 0x00000000006db7e8 in mysql_parse (thd=0x7fd71c354000, rawbuf=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_parse.cc:6922#14 0x00000000006dcf59 in dispatch_command (command=<value optimized out>, thd=0x7fd71c354000, packet=0x7fd72330f001 “insert into t(val) select 100 from tt”, packet_length=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_parse.cc:1434#15 0x00000000006aa772 in do_handle_one_connection (thd_arg=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_connect.cc:1532#16 0x00000000006aa860 in handle_one_connection (arg=0x7fd71c354000) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_connect.cc:1443#17 0x0000000000adc113 in pfs_spawn_thread (arg=0x7fd71c3443e0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/perfschema/pfs.cc:1860#18 0x00007fd74c9869d1 in start_thread () from /lib64/libpthread.so.0#19 0x00007fd74b0a08fd in clone () from /lib64/libc.so.6So, we started with SELECT part and set IS lock on the table we read from tt. This is expected, as manual says: “INSERT INTO T SELECT … FROM S WHERE … sets an exclusive index record lock (without a gap lock) on each row inserted into T. If the transaction isolation level is READ COMMITTED, or innodb_locks_unsafe_for_binlog is enabled and the transaction isolation level is not SERIALIZABLE, InnoDB does the search on S as a consistent read (no locks). Otherwise, InnoDB sets shared next-key locks on rows from S. InnoDB has to set locks in the latter case:”Our case is REPEATABLE READ and innodb_locks_unsafe_for_binlog is not enabled, so some kind of S locks will be set on rows, and it means we need IS lock on the table (again, by the manual). Let’s continue:(gdb) cContinuing.Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249ae700, heap_no=2, index=0x7fd7233f82e8, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:23292329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {(gdb) p index->table_name$2 = 0x7fd7383feae8 “test/tt”(gdb) p index->name$3 = 0x7fd7233f8480 “PRIMARY”So, we clearly see an attempt to set a record lock on some row (heap_no=2) in the PRIMARY key of the table tt (this is a row of data, surely). mode=2 means LOCK_S in the enum. So this is an attempt to set S lock on the row (“shared next-key lock” as manual explained above, one day I’ll explain how to distinguish it from other kinds of shared locks if needed). Let’s continue:(gdb) cContinuing.Breakpoint 3, row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:11461146            if (trx == table->autoinc_trx) {(gdb) p trx$4 = <value optimized out>(gdb) p table->namevalue has been optimized out(gdb) p table$5 = <value optimized out>(gdb) bt#0  row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:1146#1  0x00000000008e8dcc in ha_innobase::innobase_lock_autoinc (this=0x7fd71bf9cc10) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:7084#2  0x00000000008e9128 in innobase_get_autoinc (this=0x7fd71bf9cc10, offset=1, increment=1, nb_desired_values=1, first_value=0x7fd74cf767c0, nb_reserved_values=0x7fd74cf767d0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:13890#3  ha_innobase::get_auto_increment (this=0x7fd71bf9cc10, offset=1, increment=1, nb_desired_values=1, first_value=0x7fd74cf767c0, nb_reserved_values=0x7fd74cf767d0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:13961#4  0x000000000059ad18 in handler::update_auto_increment (this=0x7fd71bf9cc10) at /usr/src/debug/percona-server-5.6.23-72.1/sql/handler.cc:3472#5  0x00000000008f06e9 in ha_innobase::write_row (this=0x7fd71bf9cc10, record=0x7fd71bfec810  <incomplete sequence \375>) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:7271…#19 0x00000000006dcf59 in dispatch_command (command=<value optimized out>, thd=0x7fd71c354000, packet=0x7fd72330f001 “insert into t(val) select 100 from tt”, packet_length=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_parse.cc:1434… I had not checked the code in time, so attempts to print something I see had not helped, but top lines of backtrace clearly show: we want to get auto_increment value for the row we are planning to write somewhere (to the table t). So, we set IS lock on table read from, we set S lock on the first row we tried to read from it via SELECT and then we try to insert the row into the target table t, and we need auto_increment value for it. Let’s continue:(gdb) cContinuing.Breakpoint 1, lock_table (flags=0, table=0x7fd7233f69e8, mode=LOCK_AUTO_INC, thr=0x7fd7233e3780) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:44264426            if (flags & BTR_NO_LOCKING_FLAG) {(gdb) p table->name$6 = 0x7fd7383feac0 “test/t”(gdb) p thr$7 = (que_thr_t *) 0x7fd7233e3780(gdb) p *thr$8 = {common = {type = 9, parent = 0x7fd7233e36b8, brother = 0x0, val = {data = 0x0, ext = 0, len = 0, type = {prtype = 0, mtype = 0, len = 0, mbminmaxlen = 0}}, val_buf_size = 0}, magic_n = 8476583, child = 0x7fd7233e3548, graph = 0x7fd7233e36b8, state = 1, is_active = 1, run_node = 0x7fd7233e3548, prev_node = 0x7fd7233e3548, resource = 0, lock_state = 0, slot = 0x0, thrs = {prev = 0x0, next = 0x0}, trx_thrs = {prev = 0x0, next = 0x0}, queue = {prev = 0x0, next = 0x0}, fk_cascade_depth = 0}So,  we clearly see the attempt to set LOCK_AUTO_INC on the table t. This happens only after we read some row from the table tt and now inserting it into the target table, t.  (gdb) cContinuing.Breakpoint 1, lock_table (flags=0, table=0x7fd7233f69e8, mode=LOCK_IX, thr=0x7fd7233e3780) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:44264426            if (flags & BTR_NO_LOCKING_FLAG) {(gdb) p table->name$9 = 0x7fd7383feac0 “test/t”(gdb) p *table$10 = {id = 65, heap = 0x7fd71c3514a0, name = 0x7fd7383feac0 “test/t”, dir_path_of_temp_table = 0x0, data_dir_path = 0x0, space = 51, flags = 1, flags2 = 80, ibd_file_missing = 0, cached = 1, to_be_dropped = 0, n_def = 5, n_cols = 5, can_be_evicted = 1, corrupted = 0, drop_aborted = 0, cols = 0x7fd71c371868, col_names = 0x7fd71c3718f0 “id”, name_hash = 0x0, id_hash = 0x0, indexes = {count = 1, start = 0x7fd7233f7b68, end = 0x7fd7233f7b68}, foreign_set = std::set with 0 elements, referenced_set = std::set with 0 elements, table_LRU = {prev = 0x7fd74902cb68, next = 0x7fd7233f7de8}, fk_max_recusive_level = 0, n_foreign_key_checks_running = 0, def_trx_id = 84487, query_cache_inv_trx_id = 84497, big_rows = 0, stats_latch_created = 2, stats_latch = 0x7fd71c3efb80, stat_initialized = 1, memcached_sync_count = 0, stats_last_recalc = 1427781732, stat_persistent = 0, stats_auto_recalc = 0, stats_sample_pages = 0, stat_n_rows = 4, stat_clustered_index_size = 1, stat_sum_of_other_index_sizes = 0, stat_modified_counter = 0, stats_bg_flag = 0 ’00’, autoinc_lock = 0x7fd71c3718a8, autoinc_mutex = {event = 0x7fd71c3ef600, lock_word = 0 ’00’, waiters = 0, list = {prev = 0x7fd7233f7f50, next = 0x7fd71c370f28}, count_os_wait = 0, cmutex_name = 0xc3cb37 “&table->autoinc_mutex”, pfs_psi = 0x0}, autoinc = 6, n_waiting_or_granted_auto_inc_locks = 1, autoinc_trx = 0x7fd71c370c68, fts = 0x0, quiesce = QUIESCE_NONE, n_rec_locks = 0, n_ref_count = 1, locks = {count = 1, start = 0x7fd71c3718a8, end = 0x7fd71c3718a8}, is_corrupt = 0}(gdb) p table->autoinc_lock$11 = (ib_lock_t *) 0x7fd71c3718a8(gdb) p *(table->autoinc_lock)$12 = {trx = 0x7fd71c370c68, trx_locks = {prev = 0x7fd71c373530, next = 0x0}, type_mode = 20, hash = 0x0, index = 0x0, un_member = {tab_lock = {table = 0x7fd7233f69e8, locks = {prev = 0x0, next = 0x0}}, rec_lock = {space = 140561986054632, page_no = 0, n_bits = 0}}} Now, we set IX lock on table T (no wonder, we need to inserts rows there). According to the comments in the code:– this is from storage/innobase/include/lock0priv.h static const byte lock_compatibility_matrix[5][5] = { /**         IS     IX       S     X       AI */ /* IS */ {  TRUE,  TRUE,  TRUE,  FALSE,  TRUE}, /* IX */ {  TRUE,  TRUE,  FALSE, FALSE,  TRUE}, /* S  */ {  TRUE,  FALSE, TRUE,  FALSE,  FALSE}, /* X  */ {  FALSE, FALSE, FALSE, FALSE,  FALSE}, /* AI */ {  TRUE,  TRUE,  FALSE, FALSE,  FALSE}};…/* STRONGER-OR-EQUAL RELATION (mode1=row, mode2=column) *    IS IX S  X  AI * IS +  –  –  –  – * IX +  +  –  –  – * S  +  –  +  –  – * X  +  +  +  +  + * AI –  –  –  –  + * See lock_mode_stronger_or_eq(). */static const byte lock_strength_matrix[5][5] = { /**         IS     IX       S     X       AI */ /* IS */ {  TRUE,  FALSE, FALSE,  FALSE, FALSE}, /* IX */ {  TRUE,  TRUE,  FALSE, FALSE,  FALSE}, /* S  */ {  TRUE,  FALSE, TRUE,  FALSE,  FALSE}, /* X  */ {  TRUE,  TRUE,  TRUE,  TRUE,   TRUE}, /* AI */ {  FALSE, FALSE, FALSE, FALSE,  TRUE}};IX lock is compatible with AI (and AI is hold by this same transaction anyway), so let’s assume we got it and continue:(gdb) cContinuing.Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249ae700, heap_no=3, index=0x7fd7233f82e8, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:23292329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {(gdb) p index->table_name$13 = 0x7fd7383feae8 “test/tt”(gdb) p index->name$14 = 0x7fd7233f8480 “PRIMARY”Now we set S lock (mode=2) on the next row (heap_no=3) we read in SELECT. We continue reading…(gdb) cContinuing.Breakpoint 3, row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:11461146            if (trx == table->autoinc_trx) {(gdb) cContinuing.So, request for auto_increment value again (we ask for the lock that we already have), to get it for the row we plan to insert. All clear, so we continued and got this:[Switching to Thread 0x7fd716bf7700 (LWP 1718)]Breakpoint 1, lock_table (flags=0, table=0x7fd74902cb68, mode=LOCK_IX, thr=0x7fd71c1db638) at /usr/src/debug/percona-serverinnodb_table-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:44264426            if (flags & BTR_NO_LOCKING_FLAG) {(gdb) p table->name$15 = 0x7fd730396c60 “mysql/innodb_table_stats”So, other thread stepped in to set IX lock on innodb_table_stats table. This is a background update of table statistics and there will be many hits for our breakpoints because of this. I’ll skip them all from now on, to keep concentrated on one main topic. For our tables in question we later had already well know sequences:…Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249ae700, heap_no=5, index=0x7fd7233f82e8, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:23292329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {(gdb) p index->table_name$17 = 0x7fd7383feae8 “test/tt”(gdb) cContinuing.Breakpoint 3, row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:11461146            if (trx == table->autoinc_trx) {(gdb) cContinuing….that is, requests for S lock on yet another row in the source table (tt) and then request for the auto_increment. It ends up like this (changes to persistent statistics tables skipped):Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249ae700, heap_no=1, index=0x7fd7233f82e8, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:23292329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {(gdb) p index->table_name$20 = 0x7fd7383feae8 “test/tt”(gdb) p block$21 = (const buf_block_t *) 0x7fd7249ae700(gdb) p *block$22 = {page = {space = 52, offset = 3, buf_fix_count = 1, io_fix = 0 ’00’, state = 5 ’05’, flush_type = 1, buf_pool_index = 0, zip = {data = 0x0, m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 0}, hash = 0x0, list = {prev = 0x7fd7249cf280, next = 0x0}, newest_modification = 129192199, oldest_modification = 0, LRU = {prev = 0x7fd7249ae580, next = 0x7fd7249ae880}, old = 0, freed_page_clock = 0, access_time = 1852500389, is_corrupt = 0}, frame = 0x7fd72c438000 “\333s\257\247”, unzip_LRU = {prev = 0x0, next = 0x0}, mutex = {event = 0x7fd72400e900, lock_word = 0 ’00’, waiters = 0, list = {prev = 0x7fd7249ae900, next = 0x7fd7249ae600}, count_os_wait = 0, cmutex_name = 0xc375cd “&block->mutex”, pfs_psi = 0x0}, lock = {lock_word = 1048575, waiters = 0, recursive = 0, writer_thread = 140561776342784, event = 0x7fd72400e980, wait_ex_event = 0x7fd72400ea00, list = {prev = 0x7fd7249ae940, next = 0x7fd7249ae640}, pfs_psi = 0x0, count_os_wait = 0, lock_name = 0xc375db “&block->lock”, last_s_file_name = 0xc281a0 “/mnt/workspace/percona-server-5.6-redhat-binary/label_exp/centos6-64/rpmbuild/BUILD/percona-server-5.6.23-72.1/storage/innobase/row/row0sel.cc”, last_x_file_name = 0xc34650 “/mnt/workspace/percona-server-5.6-redhat-binary/label_exp/centos6-64/rpmbuild/BUILD/percona-server-5.6.23-72.1/storage/innobase/btr/btr0btr.cc”, writer_is_wait_ex = 0, last_s_line = 3226, last_x_line = 741}, lock_hash_val = 39242, check_index_page_at_flush = 1, modify_clock = 1, n_hash_helps = 0, n_fields = 1, n_bytes = 0, left_side = 1, curr_n_fields = 0, curr_n_bytes = 0, curr_left_side = 0, index = 0x0}(gdb) cContinuing.Note heap_no=1 above. This is a supremum record. I’ve also checked some details you can get for the block (page in the buffer pool) in the process. At this moment we’ve finally got rows inserted:mysql> insert into t(val) select 100 from tt;Query OK, 4 rows affected, 1 warning (31 min 9.51 sec)Records: 4  Duplicates: 0  Warnings: 1mysql> show warnings\G*************************** 1. row ***************************  Level: Note   Code: 1592Message: Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ on master and the slave.1 row in set (34.72 sec)As you can assume for the above (time to execute show warnings)  we hit some breakpoints as InnoDB continues to update persistent statistics for the table t. I’ve later executed the following:mysql> show engine innodb status\G…————TRANSACTIONS————Trx id counter 84515Purge done for trx’s n:o < 84514 undo n:o < 0 state: runningHistory list length 371LIST OF TRANSACTIONS FOR EACH SESSION:—TRANSACTION 84514, ACTIVE 223 sec2 lock struct(s), heap size 360, 2 row lock(s), undo log entries 2—TRANSACTION 84511, ACTIVE 2394 sec3 lock struct(s), heap size 360, 5 row lock(s), undo log entries 4MySQL thread id 1, OS thread handle 0x7fd74cf79700, query id 18 localhost root initshow engine innodb status——–…————–ROW OPERATIONS————–0 queries inside InnoDB, 0 queries in queue0 read views open inside InnoDB2 RW transactions active inside InnoDB0 RO transactions active inside InnoDB2 out of 1000 descriptors usedMain thread process no. 1681, id 140561818302208, state: sleepingNumber of rows inserted 12, updated 0, deleted 0, read 120.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/sSo, you see transaction that made 2 changes in the background thead (this is reated to persitent statistics table), and our transaction that made 4 changes (inserted 4 rows). You see 5 row locks (and surely will NOT see AUTO-INC table one no matter how you try, as statement is completed, but it was there). These are 5 S locks set on the tt table we read (4 rows read and supremum record). But where our X locks on rows inserted into the t table (transaction is still active)? Here they are:mysql> select * from t;+—-+——+| id | val  |+—-+——+|  1 |    1 ||  2 |    2 ||  3 |    3 ||  4 |    4 ||  5 |  100 ||  6 |  100 ||  7 |  100 ||  8 |  100 |+—-+——+8 rows in set (0.00 sec)They exist, but they are implicit. You’ve probably noted that there was no call for X record lock mentioned, and they had NOT happened. Implicit locks do not have accociated lock object with them, they are present just because the trx_id for the transaction that changed (inserted, in this case) the row. Annamalai explained them well here.Probably it’s time to stop, as this post is already big enough. To summarize, the following locks are set (and in the following order) while executing INSERT INTO t … SELECT FROM tt where there is an auto_increment column for which the value is generated in the destination table with default innodb_aunoinc_lock_mode=1 according to our tracing in gdb:IS lock on the source table (tt)S lock on the first row in tt we read with SELECTAUTO-INC lock on the destination table (t)IX lock on the destination table (t)Probably implicit X lock on the row inserted (we had not seen this in gdb clearly)S lock on the next row in tt we read with SELECTProbably implicit X lock on the row inserted (we had not seen this in gdb clearly)…S lock on the supremum in tt we read with SELECTI see three potential problems here.There is a small chance to run fast concurrent INSERT (that uses AUTO-INC lock) into the destination table after we read the first row from the source table, but before we request AUTO-INC lock on the destination.Also, try to find explanation for this case in the manual. It says nothing on when AUTO-INC lock is set.The last but not the least, I do not see any explanation of implicit X locks in the manual. I had not try hard yet.For these (and more) I’ll report bugs. The story will be continued in the next blog post(s) in this series. As for that “day” when I plan ot tell you more about InnoDB locks, it’s predefined. Please, visit this talk at Percona Live 2015. There Nilnanadan and me (let’s hope) will tell you notably more based on our findings and experience.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569