Author: Valeriy Kravchuk

Problems of Oracle’s MySQL as an Open Source Product

In my previous summary blog post I listed 5 problems I see with the way Oracle handles MySQL server development. The first of them was that “Oracle does not develop MySQL server in a true open source way” and this is actually what I started my draft of that entire blog post with. Now it’s time to get into details, as so far there was mostly fun around this and statements that MariaDB also could do better in the related Twitter discussion I had.So, let me explain what forces me to think that Oracle is treating MySQL somewhat wrong for the open source product.

Nice pathway on this photo, but it’s not straight and it’s not clear where it goes. Same with MySQL development…
We get MySQL source code updated at GitHub only when (or, as it often happened in the past, some time after) the official release of new version happens. You can see, for example, that MySQL 8.0 source code at GitHub was actually last time updated on April 3, 2018, while MySQL 8.0.11 GA was released officially on April 19, 2018 (and that’s when new code became really available in public repository). We do not see any code changes later than April 3, while it’s clear that there are bug fixes already implemented for MySQL 8.0.12 (see Bug #90523 – “[MySQL 8.0 GA Release Build] InnoDB Assertion: (capacity & (capacity – 1)) == 0”, for example. There is an easy way to crash official MySQL 8.0.11 binaries upon startup, fixed back before April 30, with some description of the fix even, but no source code of the fix is published) and 8.0.13 even (see Bug #90999 – “Bad usage of ppoll in libmysql”). With Oracle’s approach to sharing the source code, we can not see the fixes that are already made long time ago, apply them, test them or comment on them. This is fundamentally wrong, IMHO, for any open source software.In other projects we usually can see the code as soon as it is pushed to the branch (check MariaDB if you care, last change few hours ago at the moment). Main branches may have more strict rules for updating, but in general we see fixes as they happen, not only when new official release happens.
Side note: if you see that Bug #90523 became private after I mentioned it here, that’s another wrong thing they often do. More on the in the next post, on community bug reports handling by Oracle… Interesting enough, when the fix comes from community we can usually see the patch. This happened to the Bug #90999 mentioned above – we have a fix provided by Facebook and one can see the patch in Bug #91067 – “Contribution by Facebook: Do not use sigmask in ppoll for client libraries”. When somebody makes pull request, patch source is visible. But one can never be sure if it’s the final patch and had it passed all the usual QA tests and reviews, or what happens to pull requests closed because developer had not signed the agreement…If the fix is developed by Oracle you’ll see the code changed only with/after the official release. Moreover, it would be on you to identify the exact commit(s) that introduced the fix. For a long time Laurynas Biveinis from Percona cared to add comments about the exact commit that fixed the bug to public bug reports (see Bug #77689 – “mysql_execute_command SQLCOM_UNLOCK_TABLES redundant trans_check_state check?” as one of examples). Community members have to work hard to “reverse engineer” Oracle’s fixes and link them back to details of real problems (community bug reports) they were intended to resolve!Compare this to a typical changelog of MariaDB that leads you directly to commits and code changes.What’s even worse, Oracle started a practice to publish only part of their changes made for the release. Some tests, those for “security” bugs, are NOT published even if we assume they exist or even can be 100% sure they exist.My recent enough favorite example is the “The CREATE TABLE of death” bug reported by Jean-François Gagné. If you follow his blog post and links in it you can find out all the details, including the test case that is public in MariaDB. With this public information you can go and crash any affected older MySQL versions. Bug reporter did everything to inform affected vendors properly, and responsible vendors disclosed the test (after they fixed the problem)!Now, try to find similar test in public GitHub tree of Oracle MySQL. I tried to find it literally, try to find references to somewhat related public bug numbers etc, but failed. If you know better and can identify the related public test at GitHub, please, add a comment and correct me!To summarize, this is what I am mostly concerned about:
Public source code is updated only with the releases. There are no feature-specific code branches, development branches, just nothing public until the official release.
Oracle does not provide any details about commits and their relations to bugs fixed in the release notes or anywhere else outside GitHub. One has to go study the source code to make his own conclusions.
Oracle does not share some of test cases in their commits. So, some test cases remain non-public and we can only guess (based on code analysis) what was the real intention of the fix. This applies to security bugs and who knows to what else.
I would not go into other potential problems (I’ve heard about some others from developers, for example, related to code refactoring Oracle does) or more details. The above is enough for me to state that Oracle do wrong things with the way they publish source code and threat MySQL as open source product.All the problems mentioned above were introduced by Oracle, these never happened in MySQL AB or Sun. MariaDB and Percona servers may have their own problems, but the above do NOT apply to them, so I state that other vendors develop MySQL forks and related projects differently, and still are in business and doing well!

On InnoDB Data Compression in MySQL

Another story that I’ve prepared back in April for my meeting with one of customers in London was a “compression story”. We spent a lot of time on it in several support issues in the past, with only limited success.In case of InnoDB tables, there are actually two ways to compress data (besides relying on filesystem compression or compressing individual columns at server or application side). Historically the first one was introduced by the Barracuda InnoDB file format and ROW_FORMAT=COMPRESSED it supported. Notable number of related bugs were reported with time, and it may be not that easy to identify them all (you can find current list of bugs tagged with “compression” here). I’ve picked up the following bugs for my “story”:

Bug #88220 – “compressing and uncompressing InnoDB tables seems to be inconsistent”. Over years Simon Mudd, Monty Solomon (see related Bug #70534 – “Removing table compression leaves compressed keys”) and other community members reported several bugs related to inconsistencies and surprises with key_block_size option. It is used for both MyISAM and InnoDB storage engines (for compressed tables) and it seems nobody is going to fix the remaining problems until they are gone with MyISAM engine.

Bug #69588 – “MyISAM to InnoDB compressed slower than MyISAM to InnoDB, Then InnoDB to Compressed”. Just a detail to take into account, noted 5 years ago by Joffrey MICHAIE, verified almost 4 years ago and then getting zero public attention from Oracle engineers.

Bug #62431 – “What is needed to make innodb compression work for 32KB pages?”. Nothing can be done according to the manual:”In particular, ROW_FORMAT=COMPRESSED in the Barracuda file format assumes that the page size is at most 16KB and uses 14-bit pointers.”

Bug #78827 – “Speedup replication of compressed tables”. Come on, Daniël van Eeden, nobody cares that”Replication and InnoDB compressed tables are not efficiently working together.”The bug is still “Open”.

Bug #75110 – “Massive, to-be-compressed not committed InnoDB table is total database downtime”. This problem was reported by Jouni Järvinen back in 2014. Surely this is not a bug, but it seems nobody even tried to speed up compression in any way on multiple cores.

Bug #84439 – “Table of row size of ~800 bytes does not compress with KEY_BLOCK_SIZE=1”. It was reported by Jean-François Gagné, who asked for a reasonable error message at least. Nothing happens after verification.

Bug #77089 – “Misleading innochecksum error for compressed tables with key_block_size=16″. This problem was reported by Laurynas Biveinis more than three years ago, immediately verified and then got zero attention.

The boats above do not use the space for mooring efficiently. They need better compression.

Transparent Page Compression for InnoDB tables was added later and looked promising. If you are lucky to use filesystem with sparse file and hole punching support and proper OS or kernel version, then you could expect notable saving of disk space with very few additional keystrokes (like COMPRESSION=”zlib”) when defining the table. Different compression libraries were supported. Moreover (see here), only uncompressed pages are stored in memory in this case, and this improved the efficiency of buffer pool usage. Sounded promising originally, but there are still bugs to consider:

Bug #78277 – “InnoDB deadlock, thread stuck on kernel calls from transparent page compression”. This bug alone (reported by Mark Callaghan back in 2015) may be a reason to NOT use the feature in production, as soon as you hit it (chances are high). there are many interesting comments that there are environments where the feature works as fast as expected, but I think this summary is good enough for most users:”[19 Oct 2015 15:56] Mark Callaghan…Slow on XFS, slow on ext4, btrfs core team tells me it will be slow there. But we can celebrate that it isn’t slow on NVMFS – closed source, not GA, can’t even find out where to buy it, not aware of anyone running it.”The bug is still “Open”.

Bug #81145 – “Sparse file and punch hole compression not working on Windows”. Not that I care about Windows that much, but still. The bug is “Verified” for 2 years.

Bug #87723 – “mysqlbackup cannot work with mysql5.7 using innodb page-level compression” Now this is awesome! Oracle’s own MySQL Enterprise Backup does NOT support the feature. Clearly they cared about making it useful… As a side note, same problem affects Percona’s xtrabackup (see PXB-1394). MariaDB resolved the problem (and several related ones like MDEV-13023) with mariabackup tool.

Bug #87603 – “compression/tablespace ignored in create/alter table when not using InnoDB”. COMPRESSION=’…/’ option is supported for MyISAM tables as well, and this again leads to problems when switching to another storage engine, as Tomislav Plavcic noted.

Bug #78672 – “assert fails in fil_io during linkbench with transparent innodb compression”. This crash (assertion failure) was noted by Mark Callaghan back in 2015. May not crash anymore since 5.7.10 according to the last comment, but nobody cares to close the bug or comment anything useful. The bug is still “Verified”.

That’s almost all I prepared for my “compression story”. It had to be sad one.What about the moral of the story? For me it’s the following:
Classical InnoDB compression (page_format=compressed) has limited efficiency and does not get any attention from developers recently. If you hit some problem with this feature you have to live with it.
Transparent page compression for InnoDB seems to be originally more like a proof of concept in MySQL that may not work well in production on commodity hardware, and software and was not integrated with backup tools. MariaDB improved it, added support for backing up page compressed tables efficiently with the same familiar xtrabackup-based approach, but there are still open problems to resolve (see MDEV-15527 and MDEV-15528 that I also picked up for my “story”).
It seems (based on public sources review at least) that both compression options do not get much attention from Oracle developers recently. If you check new features of MySQL 8.0 GA here,  you may notice that zlib version is updated, compressed temporary InnoDB tables are no longer supported and… that’s all about compression for InnoDB!
This story could probably be shortened to just one link to the summary post by Mark Callaghan from Facebook (who studied the efficiency of data compression by various engines a lot, among other performance metrics), or by simple statement that if you want data to be compressed efficiently at server side do NOT use current InnoDB implementations and better use RocksDB engine (with MariaDB or Percona Server if you need other modern features also). But I can not write any story about MySQL without referring to some bugs, and this is how I’ve ended up with the above.What if you just switched to MySQL 8.0 GA and need some new features from it badly? Then just wait for a miracle to happen (and hope Percona will make it one day 🙂

On Partitioning in MySQL

Back in April I was preparing for vacations that my wife and I planned to spend in UK. Among other things planned I wanted to visit a customer’s office in London and discuss few MySQL and MariaDB related topics, let’s call them “stories”. I tried to prepare myself for the discussion and collected a list of known active bugs (what else could I do as MySQL entomologist) for each of them. Surely live discussion was not suitable to share lists of bugs (and for some “stories” they were long), so I promised to share them later, in my blog. Time to do what I promised had finally come!One of the stories we briefly discussed was “partitioning story”. Right now I can immediately identify at least 47 active MySQL bugs in the related category.  While preparing I checked the same list and picked up 15 or so bug reports that had to illustrate my points. Let me share them here in no specific order, and add few more.In April the latest still active bug in partitioning reported by MySQL community was  Bug #88916 – “Assertion `table->s->db_create_options == part_table->s->db_create_options'”, from my colleague Elena Stepanova. Note a very simple test case that leads to assertion in debug builds, immediately verified.Recently two more bugs were reported. Reporter of Bug #91190 – “DROP PARTITION and REORGANIZE PARTITION are slow” suspects a performance regression in MySQL 8.0.11. I’ve subscribed to this bug and is following the progress carefully. Same with Bug #91203 – “For partitions table, deal with NULL with is mismatch with reference guide”. I think what happens with NULL value and range partitioning perfectly matches the manual, but the fact that INFORMATION_SCHEMA.PARTITIONS table may return wrong information after dropping partition with NULL value is somewhat unexpected.Now back to the original lists for the “story” I prepared in April:

Bug #60023 – “No Loose Index Scan for GROUP BY / DISTINCT on InnoDB partitioned table”. It was reported by Rene’ Cannao’ and since 2013 I strongly suspect that it’s fixed in MySQL 5.6+ or, as noted in another comment, may depend on statistics properly collected for the table. Still the status remains “Verified”.

Bug #78164 – “alter table command affect partitioned table data directory”. Your custom DATA DIRECTORY settings may get lost when ALTER is applied to the whole table. Quick test shows that at least in MariaDB 10.3.7 this is no longer the case. The bug is still “Verified”.

Bug #85126 – “Delete by range in presence of partitioning and no PK always picks wrong index”. It was reported by Riccardo Pizzi 16 months ago, immediately verified (without explicit list of versions affected, by the way). One more case when ordering of indexes in CREATE TABLE may matter…

Bug #81712 – “lower_case_table_names=2 ignored on ADD PARTITION on Windows”. Who cares about Windows these days?

Bug #84356 – “General tablespace table encryption”. It seems partitioning allows to overcome documented limitation. If this is intended, then the manual is wrong, otherwise I suspect the lack of careful testing of partitioning integration with other features.

Bug #88673 – “Regression CREATE TBL from 5.7.17 to 20 (part #1: innodb_file_per_table = ON).” I’ve probably mentioned this bug reported by Jean-François Gagné in more than one blog post already. Take care and do not use long partition names.

Bug #85413 – “Failing to rename a column involved in partition”. As simple as it sounds, and it still happens.

Bug #83435 – “ALTER TABLE is very slow when using PARTITIONED table”. It was reported by Roel Van de Paar back in 2016 and still remains “Verified”.

Bug #73084 – “Exchanging partitions defined with DATA DIRECTORY and INDEX DIRECTORY options”. The bug still remains “Open” (see Bug #77772 also).

Bug #73648 – “innodb table replication is very slow with some of the partitioned table”. It seems to be fixed last year as internal Bug #25687813 (see release notes for 5.6.38), but nobody cares to find this older duplicate and change its status or re-verify it.

Bug #83750 – “Import via TTS of a partitioned table only uses 1 cpu core”. This feature requested by Daniël van Eeden makes a lot of sense. I truly hope to see parallel operations implemented for partitioned tables in GA MySQL versions (as I saw some parallel processing for partitions done for some upcoming “6.1” or so version back in 2008 in Riga during the MySQL’s last company meeting I’ve attended).

Bug #64498 – “Running out of file handles when ALTERing partitioned MyISAM table”. Too many file handles are needed. This is a documented limitation that DBAs should still take into account.
I also prepared a separate small list of partition pruning bugs:

Bug #83248 – “Partition pruning is not working with LEFT JOIN”. I’ve reported it back in 2016 and it is still not fixed. There are reasons to think it is not so easy.

Bug #75085 – “Partition pruning on key partitioning with ENUM”. It was reported by  Daniël van Eeden back in 2014!

Bug #77318 – “Selects waiting on MDL when altering partitioned table”. One of the worst expectations DBA may have is that partitioned tables help to workaround “global” MDL locks because of partition pruning! This is not the case.
Does this story have any moral? I think so, and for me it’s the following:
Partitioning bugs do not get proper attention from Oracle engineers. We see bugs with wrong status and even a bug with a clear test case and a duplicate that is “Open” for 4 years. Some typical use cases are affected badly, and still no fixes (even though since 5.7 we have native partitioning in InnoDB and changing implementation gave good chance to review and either fix or re-check these bugs).
MySQL DBAs should expect all kinds of surprises when running usual DDL statements (ALTER TABLE to add column even) with partitioned tables. In the best case DDL is just unexpectedly slow for them.
Partition pruning may not work they way one expects.
We miss parallel processing for partitioned tables. They should allow to speed up queries and DDL, not to slow them down instead…
One can suspect that there is no careful internal testing performed on integration of partitioning with other features, or even basic partition maintenance operations.

Fun with Bugs #66 – On MySQL Bug Reports I am Subscribed to, Part VI

I have some free time today, but I am still lazy enough to work on numerous planned and pending “ToDo” kind of posts, so why not to continue review of older MySQL bugs I am subscribed to. Today I am going to list 15 more bugs reported more than a year ago and still not fixed:

Bug #85805 – “Incorrect ER_BAD_NULL_ERROR after LOAD DATA LOCAL INFILE”. This detailed bug report by Tsubasa Tanaka stays “Verified” for more than a year already. It’s a great example of gdb use for MySQL troubleshooting. Setting a couple of breakpoints may really help to understand how MySQL works and why some weird errors happen.

Bug #85536 – “Build error on 5.5.54”. It’s clear that almost nobody besides Roel Van de Paar cares about build problem of MySQL 5.5.x(!) on Ubuntu 16.10(!). Anyway, it’s strange that the bug remains “Verified” and not closed in any way if Oracle really does not intend to support MySQL 5.5 any longer. For now it seems MySQL 5.5 is still under extended support, so I hope to see this build problem fixed with some final 5.5.x release.

Bug #85501 – “Make all options settable as variables in configuration files”. We usually see Umesh Shastry processing bugs reported by other, but this is a rare case when he reports something himself. It’s a great feature request.

Bug #85447 – “Slave SQL thread locking issue on a certain XA workload on master”. There are good reasons to think that this bug reported by Laurynas Biveinis may be fixed since MySQL 5.7.18, but no one cares to close it properly.

Bug #85382 – “Getting semi-sync reply magic number errors when slave_compressed_protocol is 1”. This bug was reported by Jaime Sicam. Read also comments from other community members and make your own conclusions. It seems setting slave_compressed_protocol to 1 is a bad idea in general…

Bug #85191 – “performance regression with HANDLER READ syntax”. Zhai Weixiang found clear performance regression in the way MySQL 5.7 uses metadata locking for HANDLER commands.

Bug #85016 – “better description for: OS error: 71”. Clear and simple request from Shane Bester still stays “Verified”. I am not that Oracle customer affected anyway, but this seems strange to me.

Bug #84958 – “InnoDB’s MVCC has O(N^2) behaviors”. This one bug report from Domas Mituzas could be a topic for a series of blog posts… It clearly states that:”if there’re multiple row versions in InnoDB, reading one row from PK may have O(N) complexity and reading from secondary keys may have O(N^2) complexity”There is a patch that partially fixes the problem submitted by Laurynas Biveinis and created by Alexey Midenkov. While this bug is still “Verified” take carer when using secondary indexes in concurrent environments when the same data are often changed.

Bug #84868 – “Please make it possible to query replication information consistently”. Great feature request (or bug report, if you consider inconsistency as a bug) from Simon Mudd.

Bug #84615 – “More steps in connection processlist state/ events_stages”. Sveta Snirnova cared to ask to split some well known statement execution stages like “cleaning up” into more detailed ones. I think this is really important to simplify troubleshooting with performance_schema. Wrong/misleading/too generic stages forces to use other tools and may lead to wrong conclusions. I hit this with “statistics” also, see Bug #84858. Rare case when Sveta’s request just stays “Open”, for more than a year already.

Bug #84467 – “ALTERing KEY_BLOCK_SIZE keeps the old kbs in KEYs.”. Jean-François Gagné and other well known bug reporters found several problems related to KEY_BLOCK_SIZE. It seems Oracle engineers decided NOT to fix them (see Bug #88220). But then why this bug still stays “Verified”? Consistency in bugs processing is one of my dreams…

Bug #84439 – “Table of row size of ~800 bytes does not compress with KEY_BLOCK_SIZE=1.” Yet another bug report from Jean-François Gagné. Based on lack of activity, those looking for smaller data size, compression etc should look elsewhere and do not expect much from Oracle’s InnoDB. Question is, what other engines with data compression will be supported by Oracle’s MySQL 8 (or 9) GA? When you get tired wondering, consider MariaDB or Percona Server instead – they do support storage engines that are both transactional and were designed with write efficiency and space efficiency in mind. Hint: they rock…

Bug #84274 – “READ COMMITTED does not scale after 36 threads (in 5.6 after 16 threads)”. Sveta Smirnova had a chance to run benchmarks on 144 cores (the largest box I ever had a chance to use for benchmarking had 12 cores, so what do I know…) and the result is clear – READ COMMITTED transaction isolation level does not scale well (comparing to default REPEATABLE READ). It’s counter intuitive for many, but that’s what we have. I doubt MySQL 8 is going to change this (unfortunate) situation.

Bug #84241 – “Potential Race Condition”. This was found in MySQL 5.7 by Rui Gu with a little help from Helgrind.

Bug #84024 – “Optimizer thinks clustered primary key is not covering”. This bug was reported by Manuel Ung. Let me quote a comment by Øystein Grøvlen:”I can agree that the cost model for join buffering is not perfect. If so, I think we should improve this model, not rely on heuristics about covering indexes versus table scan.”I can not agree more! Let’s hope this really happens in MySQL 9 at least.
You probably noted that we see mostly already famous bug reporters mentioned in this list. But names of reporters, their customer or partner status, known achievements, even clear regressions found or patches provided do not force Oracle to fix problems faster these days… They have their own agenda and great plans for MySQL, obviously.I also have my own agenda, so I’ll proceed with this glass of wine…

Fun with Bugs #64 – On MySQL Bug Reports I am Subscribed to, Part IV

I’ve subscribed to more than 15 new MySQL bug reports since the previous post in this series, so it’s time for a new one. I am trying to follow important, funny or hard to process bug reports every day. Here is the list of the most interesting recent ones starting from the latest (with several still not processed properly):

Bug #90211 – “Various warnings and errors when compiling MySQL 8 with Clang”.  Roel Van de Paar and Percona in general continue their QA efforts in a hope to make MySQL 8 better. Current opinion of Oracle engineers on this bug is the following:”First of all, these issues are in protobuf, not MySQL per se. There are some warnings with Clang 6, but since they’re in third-party code, we have simply disabled them when compiling protobuf (will be part of 8.0.11). Optionally, -DUSE_SYSTEM_LIBS=1 will use system protobuf and thus not compile the files in question.As for the crash, we don’t support prerelease compilers (more generally, we support platforms, not compilers). Given the stack trace, it is highly likely that the issue either is in the prerelease Clang, or in protobuf.”Let’s see how it may end up. Roel rarely gives up easily…

Bug #90209 – “Performance regression with > 15K tables in MySQL 8.0 (with general tablespaces)”. Nice regression bug report from Alexander Rubin. It is still “Open”.

Bug #90190 – “Sig=6 assertion in MYSQL_BIN_LOG::new_file_impl |”. Yet another bug report from Percona employee, Ramesh Sivaraman.

Bug #89994 – “INDEX DIRECTORY shown as valid option for InnoDB table creation”. Everybody knows how much I like fine MySQL manual. Even more I like when missing or wrong details are found there, like in this case reported by by colleague from MariaDB, Claudio Nanni.

Bug #89963  – “Slowdown in creating new SSL connection”. Maybe it’s comparing apples to oranges, as stated in one of comments, but I am surprised that this (performance regression) bug report by Rene’ Cannao’ is still “Open”. It requires more attention, IMHO. Speed of connections matters a lot for MySQL.

Bug #89904 – “Can’t change innodb_max_dirty_pages_pct to 0 to flush all pages”. Good intentions to set better default value (applied a bit later than needed) led to the problem. As Simon Mudd put it:”innodb_max_dirty_pages_pct_lwm setting has existed since 5.6. This issue only comes up as by changing the default value to 10 those of us who have ignored it until now never noticed it existed. That is a shame as setting this value to a value other than 0 (e.g. 10 which is the new default) should be better and trigger some background flushing of dirty pages avoiding us hitting innodb_max_dirty_pages_pct which would trigger much more aggressive behaviour which is not really desirable.”

Bug #89876 – “mysqladmin flush-hosts is not safe in GTID mode”. Yet another bug report from Simon Mudd. See also Bug #88720 that highlights even more problems with various FLUSH statements and GTIDs.

Bug #89870 – “Group by optimization not used with partitioned tables”. For some reason this report from Arnaud Adant is still “Open”. As my colleague Richard Stracke stated:”The only solution would be, that the optimizer is able to check, if the condition in the where clause include the whole table (or partition) and in this case use group by optimization.”

Bug #89860 – “XA may lost prepared transaction and cause different between master and slave.” As this (and other, like Bug #88534) bug report from Michael Yang shows, there is still a long way to go until it would be safe to use XA transactions with MySQL.

Bug #89834 – “Replication will not connect on IPv6 – does not function in an IPv6 only environ”. This bug report from Tim St. Pierre is still “Open”.

Bug #89822 – “InnoDB retries open on EINTR error only if innodb_use_native_aio is enabled”. We have patch contributed by Laurynas Biveinis from Percona.

Bug #89758 – “Conversion from ENUM to VARCHAR fails because mysql adds prefix index”. This funny bug was found and reported by Monty Solomon.

Bug #89741 – “Events log Note level messages even for log_warnings=0”. Nikolai Ikhalainen found that this problem happens only in versions 5.5.x and 5.6.x, so chances to see it fixed are low. But I still want to know if this ever happens.

Bug #89696 – “Cyclic dependencies are not resolved properly with cascade removal”. Make sure to check nice discussion that my dear friend Sinisa Milivojevic had with a bug reporter, Andrei Anishchenko, before marking the bug as “Verified”. This regression was most likely caused by a change in MySQL 5.7.21:”InnoDB: An iterative approach to processing foreign cascade operations resulted in excessive memory use. (Bug #26191879, Bug #86573)”

Bug #89625 – “please package the debug symbols *.pdb files!”. Shane Bester always cared about having a way to debug on Windows. Recently I also started to care about this…

— It’s April Fools’ Day today, so why not to make fool of myself assuming that anyone cares about the series of blog posts.

Windows Tools for MySQL DBAs: Basic Minidump Analysis

“To a man with a hammer, everything looks like a nail.”Even though I had written many posts explaining the use of gdb for various MySQL-related tasks, I have to use other OS level troubleshooting tools from time to time. Moreover, as MySQL and MariaDB are still supported and used under Microsoft Windows in production by customers I have to serve them there, and use Windows-specific tools sometimes. So, I decided to start a series of posts (that I promised to my great colleague Vladislav Vaintroub (a.k.a Wlad) who helped me a lot over years and actually switched my attention from Performance Schema towards debuggers) about different Windows tools for MySQL DBAs (and support engineers).Developers (and maybe even power users) on Windows probably know all I plan to describe and way more, by heart, but for me many things were not obvious and took some time to search, try or even ask for some advises… So, this series of posts is going to be useful at least for me (and mostly UNIX users, like me), as a source of hints and links that may save me some time and efforts in the future.In this first post I plan to describe basic installation of “Debugging Tools for Windows” and use of cdb command line debugger to analyze minidumps (that one gets on Windows upon crashes when core-file option is added to my.ini and may get for hanging mysqld.exe process with minimal efforts using different tools) and get backtraces and few other details from them. I also plan to show simple command lines to share with DBAs and users whom you help, that allow to get useful details (more or less full backtraces, crash analysis, OS details etc) for further troubleshooting when/if dumps can not or should not be shared.— I have to confess: I use Microsoft Windows on desktops and laptops. I started from Windows 3.0 back in 1992 and ended with Windows 10 on my wife’s laptop. I use Windows even for work. Today 2 of my 4 machines used for work-related tasks run Windows (64-bit XP on old Dell box I’ve got from MySQL AB back in 2005 and 64-bit Windows 7 on this Acer netbook). At the same time, most of work I have to do since 1992 is related to UNIX of all kinds (from Xenix and SCO OpenDesktop that I connected to from VT220 terminal in at my first job after the university, to recent Linux versions used by customers in production, my Fedora 27 box and Ubuntu 14.04 netbook used as build, Docker, VirtualBox, testing, benchmarking etc servers). I had never become a real powerful user of Windows (no really complex .bat files, PowerShell programming or even Basic macros in Word, domains, shadow copy services usage for backups, nothing fancy). But on UNIX I had to master shell, vi :), some Perl and a lot of command line tools.I had to do some software development on Windows till 2005, built MySQL on Windows sometimes up to 2012 when I joined Percona (that had nothing to do with Windows at all), so I have old version of Visual Studio, some older WinDbg and other debugging tools here and there, but had not used them more than once a year, until recently… Last time I attached WinDbg to anything MySQL-related it was MariaDB 10.1.13, during some troubleshooting related to MDEV-10191.Suddenly in March I’ve got issues from customers related to hanging upon startup/InnoDB recovery and under load, and crashing while using some (somewhat exotic) storage engine, all these – on modern versions of Microsoft Windows, in production. I had no other option but to get and study backtraces (of all threads or crashing threads) and check source code. It would be so easy to get them on Linux (just ask them to install gdb , attach it to hanging mysqld process or point out to the mysqld binary and core, and get the output of thread apply all backtrace, minor details aside). But how to do this on Winsdows, in command line if possible (as I hate to share screenshots and write long explanations on where to click and what to copy/paste)? I had to check in WinDbg, get some failures because of my outdated and incomplete environment (while customer with proper environment provided useful outputs anyway), then, eventually, asked Wlad for some help. Eventually I was able to make some progress.To be ready to do this again next time with confidence, proper test environment and without wasting anybody else’s time, I decided to repeat some of these efforts in clean environment and make notes, that I am going to share in this series of blog posts. Today I’ll concentrate on installing current “Debugging Tools for Windows” and using cdb from them to process minidumps.1. Installing “Debugging Tools for Windows”There is a nice, easy to find document from Microsoft on how to get cdb and other debugging tools for Windows. For recent versions you just have to download Windows 10 SDK and then install these tools (and everything else you may need) from it. Proceed to this page, read the details, click on “Download .EXE” to get winsdksetup.exe , start it and select “Debugging Tools for Windows” when requested to select the features. Eventually you’ll get some 416+ MB downloaded and installed by default in C:\Program Files (x86)\Windows Kits\10\Debuggers\. (on default 64-bit Windows installation with C: as system disk). Quick check shows I have everything I need:C:\Program Files (x86)\Windows Kits\10\Debuggers\x64>dir…11/10/2017  11:55 PM           154,936 cdb.exe…11/10/2017  11:55 PM           576,312 windbg.exe…Here is the list of most useful cdb options for the next step:C:\Program Files (x86)\Windows Kits\10\Debuggers\x64>cdb /?cdb version 10.0.16299.91usage: cdb [options]Options:  <command-line> command to run under the debugger  -? displays command line help text…  -i <ImagePath> specifies the location of the executables that generated the                 fault (see _NT_EXECUTABLE_IMAGE_PATH)…  -lines requests that line number information be used if present…  -logo <logfile> opens a new log file…  -p <pid> specifies the decimal process ID to attach to…  -pv specifies that any attach should be noninvasive…  -y <SymbolsPath> specifies the symbol search path (see _NT_SYMBOL_PATH)  -z <CrashDmpFile> specifies the name of a crash dump file to debug…Environment Variables:    _NT_SYMBOL_PATH=[Drive:][Path]        Specify symbol image path….Control Keys:     <Ctrl-B><Enter> Quit debugger…Remember Crtl-B key combination as a way to quit from cdb. I looked as funny as the beginner vi user few times, clicking on everything to get out of that tool…2. Basic Use of cdb to Process MinidumpLet’s assume you’ve got mysqld.dmp minidump file (a kind of “core” file on UNIX, but better, at least smaller usually) created during some crash. Depending on binaries used, you may need to make sure you have .PDB files in some directory, for the mysqld.exe binary and all .dll files for plugins/extra storage engines used, in some directory. Default path to .PDB files is defined by the _NT_SYMBOL_PATH environment variable and may include multiple directories ad URLs.Initially I’ve got advice to set this environment variable as follows:set _NT_SYMBOL_PATH=srv*c:\symbols* assumes that I have a collection of .PDB files in c:\symbols on some locally available server and rely on Microsoft’s symbols server for the rest. For anything missing we can always add -y option to point to some directory with additional .PDB files. Note that MariaDB provides .pdb files along with .exe in .msi installer, not only in .zip file with binaries.So, if your mysqld.dmp file is located in h:\, mysqld.exe for the same version as generated that minidump is located in p:\software and all related .dll files and .pdb files for them all are also there, the command to get basic details about the crash in file h:\out.txt would be the following:cdb -z h:\mysqld.dmp -i p:\software -y p:\software -logo h:\out.txt -c “!sym prompts;.reload;.ecxr;q”You can click on every option underlined above to get details. It produces output like this:C:\Program Files (x86)\Windows Kits\10\Debuggers\x64>cdb -z h:\mysqld.dmp -i p:\software -y p:\software -logo h:\out.txt -c “!sym prompts;.reload;.ecxr;q”Microsoft (R) Windows Debugger Version 10.0.16299.91 AMD64Copyright (c) Microsoft Corporation. All rights reserved.Loading Dump File [h:\mysqld.dmp]User Mini Dump File: Only registers, stack and portions of memory are available************* Path validation summary **************Response                         Time (ms)     LocationOK                                             p:\software************* Path validation summary **************Response                         Time (ms)     LocationOK                                             p:\softwareDeferred                                       srv*c:\symbols* search path is: p:\software;srv*c:\symbols* search path is: p:\softwareWindows 10 Version 14393 MP (4 procs) Free x64Product: Server, suite: TerminalServer SingleUserTS10.0.14393.206 (rs1_release.160915-0644)Machine Name:Debug session time: …System Uptime: not availableProcess Uptime: 0 days X:YY:ZZ.000……………………………………..This dump file has an exception of interest stored in it.The stored exception information can be accessed via .ecxr.(1658.fd0): Access violation – code c0000005 (first/second chance not available)ntdll!NtGetContextThread+0x14:00007fff`804a7d84 c3              ret0:053> cdb: Reading initial command ‘!sym prompts;.reload;.ecxr;q’quiet mode – symbol prompts on……………………………………..rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000006rdx=000001cf0ac2e118 rsi=000001cf0abeeef8 rdi=000001cf0ac2e118rip=00007fff5f313b0d rsp=000000653804e2b0 rbp=000001cf165c9cc8 r8=0000000000000000  r9=00007fff5f384448 r10=000000653804ef70r11=000000653804eb28 r12=0000000000000000 r13=000001cf0ab49d48r14=0000000000000000 r15=000001cf0b083028iopl=0         nv up ei pl zr na po nccs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246ha_spider!spider_db_connect+0xdd:00007fff`5f313b0d 8b1498          mov     edx,dword ptr [rax+rbx*4] ds:00000000`00000000=????????quit:C:\Program Files (x86)\Windows Kits\10\Debuggers\x64>that also goes to the file pointed out by the -logo option. Here we have some weird crash in Spider engine of MariaDB that  is not a topic of current post.If you think the crash is related to some activity of other threads, you can get all unique stack dumps with the following options:cdb -lines -z h:\mysqld.dmp -i p:\software -y p:\software -logo h:\out.txt -c “!sym prompts;.reload;!uniqstack -p;q”
This is how the backtrace of slave SQL thread may look like, note files with line numbers for each frame (-lines option):
. 44  Id: 1658.1584 Suspend: 0 Teb: 00000065`32185000 Unfrozen      Priority: 0  Priority class: 32Child-SP          RetAddr           Call Site00000065`353fed08 00007fff`8046d119 ntdll!NtWaitForAlertByThreadId+0x1400000065`353fed10 00007fff`7cbd8d78 ntdll!RtlSleepConditionVariableCS+0xc900000065`353fed80 00007ff6`2d7d62e7 KERNELBASE!SleepConditionVariableCS+0x2800000065`353fedb0 00007ff6`2d446c8e mysqld!pthread_cond_timedwait(struct _RTL_CONDITION_VARIABLE * cond = 0x000001ce`66805688, struct _RTL_CRITICAL_SECTION * mutex = 0x000001ce`668051b8, struct timespec * abstime = <Value unavailable error>)+0x27 [d:\winx64-packages\build\src\mysys\my_wincond.c @ 85](Inline Function) ——–`——– mysqld!inline_mysql_cond_wait+0x61 [d:\winx64-packages\build\src\include\mysql\psi\mysql_thread.h @ 1149]00000065`353fede0 00007ff6`2d4b1718 mysqld!MYSQL_BIN_LOG::wait_for_update_relay_log(class THD * thd = <Value unavailable error>)+0xce [d:\winx64-packages\build\src\sql\ @ 8055]00000065`353fee90 00007ff6`2d4af03f mysqld!next_event(struct rpl_group_info * rgi = 0x000001ce`667fe560, unsigned int64 * event_size = 0x00000065`353ff008)+0x2b8 [d:\winx64-packages\build\src\sql\ @ 7148]00000065`353fef60 00007ff6`2d4bb038 mysqld!exec_relay_log_event(class THD * thd= 0x000001ce`6682ece8, class Relay_log_info * rli = 0x000001ce`66804d58, structrpl_group_info * serial_rgi = 0x000001ce`667fe560)+0x8f [d:\winx64-packages\build\src\sql\ @ 3866]00000065`353ff000 00007ff6`2d7d35cb mysqld!handle_slave_sql(void * arg = 0x000001ce`66803430)+0xa28 [d:\winx64-packages\build\src\sql\ @ 5145]00000065`353ff780 00007ff6`2d852d51 mysqld!pthread_start(void * p = <Value unavailable error>)+0x1b [d:\winx64-packages\build\src\mysys\my_winthread.c @ 62](Inline Function) ——–`——– mysqld!invoke_thread_procedure+0xe [d:\th\minkernel\crts\ucrt\src\appcrt\startup\thread.cpp @ 91]00000065`353ff7b0 00007fff`80338364 mysqld!thread_start<unsigned int (void * parameter = 0x00000000`00000000)+0x5d [d:\th\minkernel\crts\ucrt\src\appcrt\startup\thread.cpp @ 115]00000065`353ff7e0 00007fff`804670d1 kernel32!BaseThreadInitThunk+0x1400000065`353ff810 00000000`00000000 ntdll!RtlUserThreadStart+0x21
For crash analysis usually !analyze command is also used:

cdb -lines -z h:\mysqld.dmp -i p:\software -y p:\software -logo h:\out.txt -c “!sym prompts;.reload;!analyze -v;q”
It may give some details about the exception happened:

FAULTING_IP:ha_spider!spider_db_connect+dd00007fff`5f313b0d 8b1498          mov     edx,dword ptr [rax+rbx*4]EXCEPTION_RECORD:  (.exr -1)ExceptionAddress: 00007fff5f313b0d (ha_spider!spider_db_connect+0x00000000000000dd)   ExceptionCode: c0000005 (Access violation)  ExceptionFlags: 00000000NumberParameters: 2   Parameter[0]: 0000000000000000   Parameter[1]: 0000000000000000Attempt to read from address 0000000000000000DEFAULT_BUCKET_ID:  NULL_POINTER_READPROCESS_NAME:  mysqld.exeERROR_CODE: (NTSTATUS) 0xc0000005 – <Unable to get error code text>

STACK_TEXT:00000065`3804e2b0 00007fff`5f3132ad : 00000000`00000000 000001cf`1741ab68 000001cf`0b083028 000001cf`0ac2e118 : ha_spider!spider_db_connect+0xdd00000065`3804e330 00007fff`5f3117f8 : 000001ce`669107c8 000001cf`0ac2e118 000001cf`1741ab68 00000000`00000001 : ha_spider!spider_db_conn_queue_action+0xad00000065`3804ea20 00007fff`5f31a1ee : 00000000`00000000 000001cf`0abeeef8 00000000`00000000 000001cd`c0b30000 : ha_spider!spider_db_before_query+0x10800000065`3804eaa0 00007fff`5f31a0bf : 00000000`00000000 00000000`00000000 00000065`3804ec70 00000000`00000038 : ha_spider!spider_db_set_names_internal+0x11e00000065`3804eb30 00007fff`5f369b4e : 00000000`00000000 00000065`3804ec70 00007fff`5f387f08 00000000`00000000 : ha_spider!spider_db_set_names+0x3f00000065`3804eb70 00007fff`5f32f5f1 : 00000000`00000001 00000065`00000000 41cfffff`00000001 00000000`00000001 : ha_spider!spider_mysql_handler::show_table_status+0x15e00000065`3804ece0 00007fff`5f3222e8 : 00000000`00000001 00000065`00000000 00000000`5ab175cd 000001cf`0b0886f8 : ha_spider!spider_get_sts+0x20100000065`3804edb0 00007ff6`2d7d35cb : 00000000`00000057 000001cf`0ab49d48 00000000`00000000 00007fff`5f321c10 : ha_spider!spider_bg_sts_action+0x6d800000065`3804fa30 00007ff6`2d852d51 : 000001cf`17008fe0 000001cf`0aa3fef0 00000000`00000000 00000000`00000000 : mysqld!pthread_start+0x1b00000065`3804fa60 00007fff`80338364 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : mysqld!thread_start<unsigned int (__cdecl*)(void * __ptr64)>+0x5d00000065`3804fa90 00007fff`804670d1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x1400000065`3804fac0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21

Finally (for this post), this is how we can get information about a crashing thread, including details about local variables (like full backtrace in gdb). We apply !for_each_frame extension and use dv to “display variable”:
cdb -z h:\mysqld.dmp -i p:\software -y p:\software -logo h:\out.txt -c “!sym prompts;.reload;.ecxr;!for_each_frame dv /t;q”
The result will include details about each frame, parameters and local variables, like this:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _00 00000065`3804e2b0 00007fff`5f3132ad ha_spider!spider_db_connect+0xddstruct st_spider_share * share = 0x000001cf`165c9cc8struct st_spider_conn * conn = 0x000001cf`0ac2e118int link_idx = 0n0int error_num = <value unavailable>class THD * thd = 0x000001cf`0abeeef8int64 connect_retry_interval = <value unavailable>int connect_retry_count = <value unavailable>int64 tmp_time = <value unavailable>_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _01 00000065`3804e330 00007fff`5f3117f8 ha_spider!spider_db_conn_queue_action+0xadstruct st_spider_conn * conn = 0x000001cf`0ac2e118int error_num = 0n0char [1532] sql_buf = char [1532] “”class spider_string sql_str = class spider_stringclass spider_db_result * result = <value unavailable>struct st_spider_db_request_key request_key = struct st_spider_db_request_key_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _02 00000065`3804ea20 00007fff`5f31a1ee ha_spider!spider_db_before_query+0x108struct st_spider_conn * conn = 0x000001cf`0ac2e118int * need_mon = 0x000001cf`1741ab68int error_num = 0n0bool tmp_mta_conn_mutex_lock_already = trueclass ha_spider * spider = <value unavailable>bool tmp_mta_conn_mutex_unlock_later = <value unavailable>_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _03 00000065`3804eaa0 00007fff`5f31a0bf ha_spider!spider_db_set_names_internal+0x11estruct st_spider_transaction * trx = 0x000001cf`0b083028struct st_spider_share * share = 0x000001cf`0ab49d48struct st_spider_conn * conn = 0x000001cf`0ac2e118int all_link_idx = 0n0int * need_mon = 0x000001cf`1741ab68bool tmp_mta_conn_mutex_lock_already = true_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _04 00000065`3804eb30 00007fff`5f369b4e ha_spider!spider_db_set_names+0x3fclass ha_spider * spider = <value unavailable>struct st_spider_conn * conn = <value unavailable>int link_idx = <value unavailable>_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _05 00000065`3804eb70 00007fff`5f32f5f1 ha_spider!spider_mysql_handler::show_table_status+0x15eclass spider_mysql_handler * this = 0x000001cf`0a15dd00int link_idx = 0n0int sts_mode = 0n1unsigned int flag = 1int error_num = 0n1struct st_spider_share * share = 0x000001cf`0ab49d48struct st_spider_conn * conn = 0x000001cf`0ac2e118class spider_db_result * res = <value unavailable>unsigned int64 auto_increment_value = 0unsigned int pos = 0struct st_spider_db_request_key request_key = struct st_spider_db_request_keystruct st_spider_db_request_key request_key = struct st_spider_db_request_key_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _06 00000065`3804ece0 00007fff`5f3222e8 ha_spider!spider_get_sts+0x201struct st_spider_share * share = 0x000001cf`0ab49d48int link_idx = 0n0int64 tmp_time = 0n1521579469class ha_spider * spider = 0x00000065`3804ef70double sts_interval = 10int sts_mode = 0n1int sts_sync = 0n0int sts_sync_level = 0n2unsigned int flag = 0x18int error_num = <value unavailable>int get_type = 0n1struct st_spider_patition_handler_share * partition_handler_share = <value unavailable>double tmp_sts_interval = <value unavailable>struct st_spider_share * tmp_share = <value unavailable>int tmp_sts_sync = <value unavailable>class ha_spider * tmp_spider = <value unavailable>int roop_count = <value unavailable>int tmp_sts_mode = <value unavailable>class THD * thd = <value unavailable>_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _07 00000065`3804edb0 00007ff6`2d7d35cb ha_spider!spider_bg_sts_action+0x6d8void * arg = 0x000001cf`0ab49d48int error_num = 0n0class ha_spider spider = class ha_spiderunsigned int * conn_link_idx = 0x000001cf`1741ab78unsigned char * conn_can_fo = 0x000001cf`1741ab80 “— memory read error at address 0x000001cf`1741ab80 —“struct st_spider_conn ** conns = 0x000001cf`1741ab70int * need_mons = 0x000001cf`1741ab68int roop_count = 0n0char ** conn_keys = 0x000001cf`1741ab88class THD * thd = 0x000001cf`0abeeef8class spider_db_handler ** dbton_hdl = 0x000001cf`1741ab90struct st_spider_transaction * trx = 0x000001cf`0b083028struct st_mysql_mutex spider_global_trx_mutex = <value unavailable>_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _08 00000065`3804fa30 00007ff6`2d852d51 mysqld!pthread_start+0x1bvoid * p = <value unavailable>void * arg = 0x000001cf`0ab49d48<function> * func = 0x00007fff`5f321c10_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _09 (Inline Function) ——–`——– mysqld!invoke_thread_procedure+0xevoid * context = 0x000001cf`17008fe0_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _0a 00000065`3804fa60 00007fff`80338364 mysqld!thread_start<unsigned int (__cdecl*)(void * __ptr64)>+0x5dvoid * parameter = 0x00000000`00000000<function> * procedure = 0x00007ff6`2d7d35b0_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _0b 00000065`3804fa90 00007fff`804670d1 kernel32!BaseThreadInitThunk+0x14Unable to enumerate locals, Win32 error 0n87Private symbols (symbols.pri) are required for locals.Type “.hh dbgerr005” for details._ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _0c 00000065`3804fac0 00000000`00000000 ntdll!RtlUserThreadStart+0x21Unable to enumerate locals, Win32 error 0n87Private symbols (symbols.pri) are required for locals.Type “.hh dbgerr005” for details.
Stay tuned. I keep working on complex MySQL/MariaDB problems under Windows, so soon will have few more findings and links to share.

Fun with Bugs #63 – On Bugs Detected by ASan

Among other things Geir Hoydalsvik stated in his nice post yesterday: “We’ve fixed a number of bugs detected by UBsan and Asan.”
This is indeed true, I already noted many related bugs fixed in recent MySQL 8.0.4. But I think that a couple of details are missing in the blog post. First of all, there still a notable number of bugs detected by ASan or noted in builds with ASan that remain “Verified”. Second, who actually found and reported these bugs?I decided to do a quick search and present my summary to clarify these details. Let me start with the list of “Verified” or “Open” bugs in public MySQL bugs database, starting from the oldest one:

Bug #69715 – “UBSAN: Item_func_mul::int_op() mishandles 9223372036854775809*-1”. The oldest related “Verified” bug I found was reported back in 2013 by Arthur O’Dwyer. Shane Bester from Oracle kindly keeps checking it with recent and upcoming releases, so we know that even ‘9.0.0-dmr-ubsan’ (built on 20 October 2017) was still affected.

Bug #80309 – “some innodb tests fail with address sanitizer (WITH_ASAN)”. It was reported by Richard Prohaska and remains “Verified” for more than two years already.

Bug #80581 – “rpl_semi_sync_[non_]group_commit_deadlock crash on ASan, debug”. This bug reported by Laurynas Biveinis from Percona two years ago is still “Verified”.

Bug #81674 – “LeakSanitizer-enabled build fails to bootstrap server for MTR”. This bug reported by  Laurynas Biveinis affects only MySQL 5.6, but still, why not to backport the fix from 5.7?

Bug #82026 – “Stack buffer overflow with –ssl-cipher=<more than 4K characters>”. Bug detected by ASan was noted by Yura Sorokin from Percona and reported by Laurynas Biveinis.

Bug #82915 – “SIGKILL myself when using innodb_limit_optimistic_insert_debug=2 and drop table”. ASan debug builds are affected. This bug was reported by Roel Van de Paar from Percona.

Bug #85995 – “Server error exit due to empty datadir causes LeakSanitizer errors”. This bug in MySQL 8.0.1 (that had to affect anyone who runs tests on ASan debug builds on a regular basis) was reported by Laurynas Biveinis and stay “Verified” for almost a year.

Bug #87129 – “Unstable test main.basedir”. This test problem reported by Laurynas Biveinis affects ASan builds, among others. See also his Bug #87190 – “Test main.group_by is unstable”.

Bug #87201 – “XCode 8.3.3+ -DWITH_UBSAN=ON bundled protobuf build error”. Yet another (this time macOS-specific) bug found by Laurynas Biveinis.

Bug #87295 – “Test group_replication.gr_single_primary_majority_loss_1 produces warnings”. Potential bug in group replication noted by Laurynas Biveinis in ASan builds.

Bug #87923 – “ASan reporting a memory leak on merge_large_tests-t”. This bug by Laurynas Biveinis is still “Verified”, while Tor Didriksen’s comment states that it it resolved with the fix for Bug #87922 (that is closed as fixed in MySQL 8.0.4). Why not to close this one also?

Bug #89438 – “LeakSanitizer errors on xplugin unit tests”. As Laurynas Biveinis found, X Plugin unit tests report errors with LeakSanitizer.

Bug #89439 – “LeakSanitizer errors on GCS unit tests”. yet another bug report for MySQL 8.0.4 by Laurynas Biveinis.

Bug #89961 – “add support for clang ubsan”. This request was made by Tor Didriksen from Oracle. It is marked as “fixed in 8.0.12”. It means we may get MySQL 8.0.11 released soon. That’s why I decided to mention the bug here.
There were also few other test failures noted on ASan debug builds. I skipped them to make this post shorter.Personally I do not run builds or tests with ASan on a regular basis. I appreciate Oracle’s efforts to make code warning-free, UBSan- and ASan-clean, and fix bugs found with ASan. But I’d also want them to process all/most of related bugs in public database properly before making announcements of new related achievement, and clearly admit and appreciate a lot of help and contribution from specific community members (mostly Laurynas Biveinis in this case).Percona engineers seem to test ASan builds of MySQL 5.7 and 8.0 (or Percona’s closely related versions) regularly, for years, and contribute back public bug reports. I suspect they found way more related bugs than internal Oracle’s QA. I think we should explicitly thank them for this contribution that made MySQL better!

Checking User Threads With gdb in MySQL 5.7+

In one of my gdb-related posts last year I noted that there is no more simple global list of user threads in MySQL 5.7+:”I had highlighted Global_THD_manager singleton also as during my next gdb sessions I had found out that simple global list of threads is also gone and in 5.7 everything is done via that Global_THD_manager. This is a topic for some other post, though.”In that post and many times later when I had to deal with MySQL 5.7+ I just checked OS threads one by one in gdb using thread  1 … thread N commands. This is not efficient at best, as I also hit numerous background threads that I often do not care about. So, a couple of weeks ago I finally decided to get back to this topic and find out how to check just user threads one by one in recent MySQL versions. I had a nice hint by Shane Bester on how to get information about $i-th thread (that he shared in one of his comments to my Facebook post):set $value = (THD**)(Global_THD_manager::thd_manager-> + (sizeof(THD**) * $i))I’ve attached gdb to an instance of Percona Server 5.7.x that I had running in my CentOS 6.9 VM and tried few commands to check types and content of the Global_THD_manager elements:
(gdb) p Global_THD_manager::thd_manager$1 = (Global_THD_manager *) 0x7fab087fd000(gdb) p Global_THD_manager::thd_manager->thd_list$2 = {m_size = 2, m_capacity = 500, m_buff = {{      data = “0060b\344\252\177000000\220i\344\252\177000000\200x\344\252\177”, ’00’ <repeats 3977 times>, align = {<No data fields>}}},  m_array_ptr = 0x7fab087fd010, m_psi_key = 0}
So, we see that internally there is some array of elements thd_list with m_size items (2 in my case) probably stored in some pre-allocated buffer of m_capacity (500) elements, stored in The type of elements is not clear, but we can try Shane’s hint and assume that they are of type THD**. Let’s try to check what we see there after type castings:(gdb) p (THD**)(Global_THD_manager::thd_manager->$4 = (THD **) 0x7fab087fd010(gdb) p  *(THD**)(Global_THD_manager::thd_manager->$5 = (THD *) 0x7faae4623000(gdb) p  **(THD**)(Global_THD_manager::thd_manager->$6 = {<MDL_context_owner> = {    _vptr.MDL_context_owner = 0x1c51f50}, <Query_arena> = {…So, we get reasonable addresses and when we dereference the resulting THD** pointer twice we indeed get a structure that looks like THD of MySQL 5.7+ (it’s very different, say, in MariaDB 10.1.x), with reasonable content (that is huge and skipped above).I’ve tried to get processlist id of thread based on findings of that post using intermediate gdb variables:(gdb) set $ppthd = (THD**)(Global_THD_manager::thd_manager-> p *($ppthd)$7 = (THD *) 0x7faae4623000…(gdb) set $pthd = *($ppthd)(gdb) p $pthd->m_thread_id$10 = 5and then directly, using offsets and checking for security contexts of threads:(gdb) p  (**(THD**)(Global_THD_manager::thd_manager->$14 = {m_ptr = 0x7faae463b060 “myuser”, m_length = 6, m_charset = 0x1d21760,  m_alloced_length = 8, m_is_alloced = true}(gdb) p  (**(THD**)(Global_THD_manager::thd_manager-> + (sizeof(THD**)))).m_main_security_ctx.m_user$15 = {m_ptr = 0x7faae46b1090 “root”, m_length = 4, m_charset = 0x1d21760,  m_alloced_length = 8, m_is_alloced = true}(gdb) p  (**(THD**)(Global_THD_manager::thd_manager-> + (sizeof(THD**)))).m_thread_id$16 = 9to confirm that I correctly get user names and thread ids for both 2 user threads I had in that “list”. As usual Shane Bester was right!Now, if you want to get more details about Global_THD_manager, you can just check the sql/mysqld_thd_manager.h file. I was interested mostly in the following:  int get_num_thread_running() const { return num_thread_running; }  uint get_thd_count() const { return global_thd_count; }  static Global_THD_manager *thd_manager;  // Array of current THDs. Protected by LOCK_thd_list.  typedef Prealloced_array<THD*, 500, true> THD_array;  THD_array thd_list;  // Array of thread ID in current use. Protected by LOCK_thread_ids.  typedef Prealloced_array<my_thread_id, 1000, true> Thread_id_array;  Thread_id_array thread_ids;First of all, how consistent it is to use both int and uint data types for values that are always >=0… The fact that our thd_list elements is actually some template-based container, Prealloced_array, it also interesting, as it would be useful to find out how it is implemented. We can find all relevant details in the include/prealloced_array.h file. I’d like to highlight the following here:”The interface is chosen to be similar to std::vector.”…private:  size_t         m_size;  size_t         m_capacity;  // This buffer must be properly aligned.  my_aligned_storage<Prealloc * sizeof(Element_type), MY_ALIGNOF(double)>m_buff;Element_type *m_array_ptr;To summarize, MySQL 5.7+ uses more C++ now, with templates, singletons, iterators and more, but still Oracle prefers to implement their own container types instead of using some standard ones. One of these generic types, Prealloced_array, is widely used and is easy to deal with in gdb, as long as you know the element type.

On InnoDB’s FULLTEXT Indexes

I had recently written about InnoDB features that I try to avoid by all means if not hate: “online” DDL and persistent optimizer statistics. Time to add one more to the list – FULLTEXT indexes.This feature had a lot of problems when initially introduced in MySQL 5.6. There was a nice series of blog posts about the initial experience with it by my colleague from Percona (at that times) Ernie Souhrada: part I, part II, and part III. Many of the problems mentioned there were resolved or properly documented since that times, but even more were discovered. So, InnoDB FULLTEXT indexes may be used, with care, when MyISAM or other engines/means to add fulltext search is not an option. The list of bugs that are still important and must be taken into account is presented below.What forced me to get back to this feature recently and hate it sincerely is one customer issue that led to this bug report: MDEV-14773  – “ALTER TABLE … MODIFY COLUMN … hangs for InnoDB table with FULLTEXT index”. Note that I have to refer to MariaDB bug report here, as related upstream Bug #88844 is hidden from community (probably considered a shame, if not a security problem)! The bug is simple: if one applies any ALTER to the InnoDB table with FULLTEXT index, even not related that index and columns in in any way, chances are high that this ALTER may cause a kind of hang/infinite loop/conflict of the thread that tries to drop temporary table used by ALTER, as one of last steps, and FTS background optimize thread. Similar to other two problematic features, new background threads were introduced and their cooperation with other threads in InnoDB seems to be not that well designed/implemented.There are many other bugs to take into account if you ever plan to add any single FULLTEXT index to your InnoDB table. Here is the list of the most important ones, mostly still “Verified” or open and ignored, that I collected during one of calm night shifts this week:

Bug #78048 – “INNODB Full text Case sensitive not working”. This bug was fixed only recently, in MySQL 5.6.39, 5.7.21, and 8.0.4.

Bug #83776 – “InnoDB FULLTEXT search returns incorrect result for operators on ignored words”. Still “Verified” on all GA versions and 8.0.x.

Bug #76210 – “InnoDB FULLTEXT index returns wrong results for key/value pair documents”. This bug was reported by Justin Swanhart 3 years ago, quickly verified and then seems to be ignored.

Bug #86036 – “InnoDB FULLTEXT index has too strict innodb_ft_result_cache_limit max limit”. I reported this bug 10 months ago, and it was immediately “Verified”. It seems FULLTEXT indexes are hardly useful in general for large InnoDB tables because of this limitation.

Bug #78977 – “Enable InnoDB fulltext index to use generated FTS_DOC_ID column”. This is a feature request (still “Open”) to get rid of this well known limitation/specific column.

Bug #86460 – “Deleted DOCID are not maintained during OPTIMIZE of InnoDB FULLTEXT tables”. If you want to get rid of deleted DOC_IDs in the INNODB_FT_DELETED, better just run ALTER TABLE … ENGINE=InnoDB.

Bug #75763 – “InnoDB FULLTEXT index reduces insert performance by up to 6x on JSON docs”. yet another verified bug report by Justin Swanhart.

Bug #69762 – “InnoDB fulltext match against in boolean mode misses results on join”. Let me quote last comment there:”Since innodb doesn’t support fulltext search on columns without fulltext index, and it is very complicated to support search on columns in multiple fulltext indexes in optimizer, it won’t be fixed.We admit it’s a point innodb fulltext is not compatible with myisam.”

Bug #85880 – “Fulltext query is too slow when each ngram token match a lot of documents”. This bug is still “Open”.

Bug #78485 – “Fulltext search with char * produces a syntax error with InnoDB”. Yet another verified regression comparing to MyISAM FULLTEXT indexes. Nobody cares for 2.5 years.

Bug #80432 – “No results in fulltext search for top level domain in domain part of email “. It ended up as “Won’t fix”, but at least a workaround was provided by Oracle developer.

Bug #81819 – “ALTER TABLE…LOCK=NONE is not allowed when FULLTEXT INDEX exists”. Online ALTER just does not work for tables with FULLTEXT indexes. This is a serious limitation.

Bug #72132 – “Auxiliary tables for InnoDB FTS indexes are always created in shared tablespace”. This my bug report was fixed in .5.6.20+ and 5.7.5+, but the fact that this regression was not noted for a long time internally says a lot about the way the feature was developed and maintained.

Bug #83560  – “InnoDB FTS – output from mysqldump extremely slow and blocks unrelated inserts”. I have yet to check the metadata locks set when the table with FULLTEXT index is used in various SQL statements, but from this “Verified” report it is clear that just lading a dump of a table with FULLTEXT indexes may work too slow for any large table.

Bug #71551 – “ft_boolean_syntax has no impact on InnoDB FTS”. yet another inconsistency with MyISAM FULLTEXT indexes that was reported 4 years ago and “Verified”, but still ignored after that.

Bug #83741 – “InnoDB: Failing assertion: lock->magic_n == 22643”. Surely, debug assertions can be ignored, but in most cases they are in the code for a good reason. This failure was reported by Roel Van de Paar from Percona.

Bug #83397 – “INSERT INTO … SELECT FROM … fails if source has > 65535 rows on FTS”. This “Verified” bug alone, reported by Daniël van Eeden, makes InnoDB FULLTEXT indexes hardly usable in production for large tables.

Bug #80296 – “FTS query exceeds result cache limit”. The bug is “Closed” silently (by the bug reporter maybe, Monty Solomon?), but users report that recent enough versions like 5.6.35 and 5.7.17 are still affected. See also Bug #82971 (no fix for MySQL 5.6.x for sure).

Bug #85876 – “Fulltext search can not find word which contains “,” or “.”.  Still “Verified” for 1 months.

Bug #68987 – “MySQL crash with InnoDB assertion failure in file”. Crash was reported in MySQL 5.6.10, not repeatable. Then (different?) assertion failure was reported in debug builds only in MySQL 5.6.21+, and verified. Not sure what’s going on with this bug report…

Bug #83398 – “Slow and unexpected explain output on FTS”. The fact that EXPLAIN may be slow when the table with FULLTEXT index is involved is now documented, so this report by Daniël van Eeden is closed.

Bug #81930 – “incorrect result with InnoDB FTS and subquery”. This bug report about wrong results by Sergei Golubchik from MariaDB was immediately “Verified”, but ignored since that time.

Bug #80347 – “mysqldump backup restore fails due to invalid FTS_DOC_ID (Error 182 and 1030)”. There is a workaround based on mydumper/myloader at least…
To summarize, InnoDB FULLTEXT indexes is one of the most problematic InnoDB features for any production use because:
There are all kinds of serious bugs, from wrong results to hangs, debug assertions and crashes, that do not seem to get any internal priority and stay “Verified” for years.
There are performance regressions and missing features comparing to MyISAM FULLTEXT indexes, so migration may cause problems.
InnoDB FULLTEXT indexes are not designed to work with really large tables/result sets.
You should expect problems during routine DBA activities, like ALTERing tables or dumps and restores when any table with InnoDB FULLTEXT index is involved. 
If you still plan/have to use it, please, make sure to use the latest MySQL version, check the list above carefully and test/check the results of fulltext searches and routine DBA operations like altering the table. You may get a lot of surprises. Consider alternatives like Sphinx seriously.

TEL/電話+86 13764045638
QQ 47079569