Author: Jean-François Gagné

JFG Posted on the Percona Community Blog – A Nice Feature in MariaDB 10.3: no InnoDB Buffer Pool in Core Dumps

I just posted an article on the Percona Community Blog.  You can access it following this link:

A Nice Feature in MariaDB 10.3: no InnoDB Buffer Pool in Core Dumps

I do not know if I will stop publishing posts on my personal blog or use both, I will see how things go…  In the rest of this post, I will share why I published there and how things went in the process.

So there is a Percona

A Nice Feature in MariaDB 10.3: no InnoDB Buffer Pool in Core Dumps

MariaDB 10.3 is now generally available (10.3.7 was released GA on 2018-05-25). The article What’s New in MariaDB Server 10.3 by the MariaDB Corporation lists three key improvements in 10.3: temporal data processing, Oracle compatibility features, and purpose-built storage engines. Even if I am excited about MyRocks and curious on Spider, I am also very interested in less flashy but still very important changes that make running the database in production easier. This post describes such improvement: no InnoDB Buffer Pool in core dumps.
Hidden in the Compression section of the page Changes & Improvements in MariaDB 10.3 from the Knowledge Base, we can read:
On Linux, shrink the core dumps by omitting the InnoDB buffer pool
This is it, no more details, only a link to MDEV-10814 (Feature request: Optionally exclude large buffers from core dumps). This Jira ticket was open in 2016-09-15 by a well-known MariaDB Support Engineer: Hartmut Holzgraefe. I know Booking.com was asking for this feature for a long time, this is even mentioned by Hartmut in a GitHub comment.
The ways this feature eases operations with MariaDB are well documented by Hartmut in the description of the Jira ticket:

it needs less available disk space to store core dumps,
it reduces the time required to write core dumps (and hence restart MySQL after a crash),
it improves security by omitting substantial amount of user data from core dumps.

In addition to that, I would add that smaller core dumps are easier to share in tickets. I am often asked by support engineers to provide a core dump in relation to a crash, and my reply is “How do you want me to give you with a 192 GB file ?” (or even bigger files as I saw MySQL/MariaDB being used on servers with 384 GB of RAM). This often leads to a “Let me think about this and I will come back to you” answer. Avoiding the InnoDB Buffer Pool in core dumps makes this less of an issue for both DBAs and support providers.
Before continuing the discussion on this improvement, I need to give more details about what a core dump is.
What is a Core Dump and Why is it Useful ?
By looking at the Linux manual page for core (and core dump file), we can read:
[A core dump is] a disk file containing an image of the process’s memory at the time of termination. This image can be used in a debugger to inspect the state of the program at the time that it terminated.
The Wikipedia article for core dump also tells us that:

the core dump includes key pieces of program state as processor registers, memory management details, and other processor and operating system flags and information,
the name comes from magnetic core memory, the principal form of random access memory from the 1950s to the 1970s, and the name has remained even if magnetic core technology is obsolete.

So a core dump is a file that can be very useful to understand the context of a crash. The exact details of how to use a core dump have been already discussed in many places and is beyond the subject of this post. The interested reader can learn more by following those links:

Getting MySQL Core file on Linux
How to Produce a Full Stack Trace for mysqld
MySQL is crashing: a support engineer’s point of view
Database issue cheat sheet (including gdb commands for using core dumps)
What to Do If MySQL Keeps Crashing
Debugging mysqld under gdb

Now that we know more about core dumps, we can get back to the discussion of the new feature.

The no InnoDB Buffer Pool in Core Dump Feature from MariaDB 10.3

As already pointed out above, there are very few details in the release notes about how this feature works. By digging in MDEV-10814, following pointers to pull requests (#333, #364, 365, …), and reading the commit message, I was able to gather this:

An initial patch was written by Hartmut in 2015.
It uses theMADV_DONTDUMPflag to the madvise system call (available in Linux kernel 3.4 and higher).
Hartmut’s patch was rebased by Daniel Black, a well-known MariaDB Community Contributor (pull request #333).
The first work by Daniel had a configuration parameter to allow including/excluding the InnoDB Buffer Pool in/from core dumps, but after a discussion in pull request #333, it was decided that the RELEASE builds would not put the InnoDB Buffer Pool in core dumps and that DEBUG builds would include it (more about this below).
The function buf_madvise_do_dump is added but never invoked by the server; it is there to be called from a debugger to re-enable full core dumping if needed (from this commit message).
The InnoDB Redo Log buffer is also excluded from core dumps (from this comment).

I have doubts about the absence of a configuration parameter for controlling the feature. Even if the InnoDB Buffer Pool (as written above, the feature also concerns the InnoDB Redo Log buffer, but I will only mention InnoDB Buffer Pool in the rest of this post for brevity) is not often required in core dumps, Marko Mäkelä, InnoDB Engineer at MariaDB.com, mentioned sometimes needing it to investigate deadlocks, corruption or race conditions. Moreover, I was recently asked, in a support ticket, to provide a core dump to understand a crash in MariaDB 10.2 (public bug report in MDEV-15608): it looks to me that the InnoDB Buffer Pool be useful here. Bottom line: having the InnoDB Buffer Pool (and Redo log buffer) in core dumps might not be regularly useful, but it is sometimes needed.
To include the InnoDB Buffer Pool in core dumps, DBAs can install DEBUG binaries or they can use a debugger to call the
buf_madvise_do_dumpfunction (well thought Daniel for compensating the absence of a configuration parameter, but there are caveats described below). Both solutions are suboptimal in my humble opinion. For #2, there are risks and drawbacks of using a debugger on a live production database (when it works … see below for a war story). For #1 and unless I am mistaken, DEBUG binaries are not available from the MariaDB download site. This means that they will have to be built by engineers of your favorite support provider, or that DBAs will have to manually compile them: this is a lot of work to expect from either party. I also think that the usage of DEBUG binaries in production should be minimized, not encouraged (DEBUG binaries are for developers, not DBAs); so I feel we are heading in the wrong direction. Bottom line: I would not be surprised (and I am not alone) that a parameter might be added in a next release to ease investigations of InnoDB bugs.Out of curiosity, I checked the core dump sizes for some versions of MySQL and MariaDB with dbdeployer (if you have not tried it yet, you should probably spend time learning how to use dbdeployer: it is very useful). Here are my naive first results with default configurations and freshly started
mysqld:
487 MB and 666 MB core dumps with MySQL 5.7.22 and 8.0.11 respectively,
673 MB and 671 MB core dumps with MariaDB 10.2.15 and MariaDB 10.3.7 respectively.

I tried understanding where the inflation is coming from in MySQL 8.0.11 but I tripped on Bug#90561 which prevents my investigations. We will have to wait for 8.0.12 to know more…
Back to the feature, I was surprised to see no shrinking between MariaDB 10.2 and 10.3. To make sure something was not wrong, I tried to have the InnoDB Buffer Pool in the core dump by calling the
buf_madvise_do_dump function. I used the slides from the gdb tips and tricks for MySQL DBAs talk by Valerii Kravchuk presented at FOSDEM 2015 (I hope a similar talk will be given soon at Percona Live as my gdb skills need a lot of improvements), but I got the following result:$ gdb -p $(pidof mysqld) -ex “call buf_madvise_do_dump()” -batch
[…]
No symbol “buf_madvise_do_dump” in current context.
After investigations, I understood that the generic MariaDB Linux packages that I used with dbdeployer are compiled without the feature. A reason could be that there is no way to know that those packages will be used on a Linux 3.4+ kernel (without a recent enough kernel, the
MADV_DONTDUMPargument does not exist for themadvisesystem call). To be able to test the feature, I would either have to build my own binaries or try packages for a specific distribution. I chose to avoid compilation but this was more tedious than I thought…By the way, maybe the
buf_madvise_do_dumpfunction should always be present in binaries and return a non-zero value when failing with a detailed message in the error logs. This would have spared me spending time understanding why it did not work in my case. I opened MDEV-16605: Always include buf_madvise_do_dumpin binaries for that.Back to my tests and to see the feature in action, I started a Ubuntu 16.04.4 LTS in AWS (it comes with a 4.4 kernel). But again, I could not call
buf_madvise_do_dump. After more investigation, I understood that the Ubuntu and Debian packages are not compiled with symbols, so callingbuf_madvise_do_dumpcannot be easily done on those (I later learned that there are mariadb-server-10.3-dbgsym packages, but I did not test them). I ended-up falling back to Centos 7.5, which comes with a 3.10 kernel, and it worked ! Below are the core dump sizes with and without callingbuf_madvise_do_dump:
527 MB core dump on MariaDB 10.3.7 (without callingbuf_madvise_do_dump),
674 MB core dump on MariaDB 10.3.7 (with callingbuf_madvise_do_dump).

I was surprised by bigger core dumps in MariaDB 10.3 than in MySQL 5.7, so I spent some time looking into that. It would have been much easier with the Memory Instrumentation from Performance Schema, but this is not yet available in MariaDB. There is a Jira ticket opened for that (MDEV-16431); if you are also interested in this feature, I suggest you vote for it.
I guessed that the additional RAM used by MariaDB 10.3 (compared to MySQL 5.7) comes from the caches for the MyISAM and Aria storage engines. Those caches, whose sizes are controlled by the key_buffer_size and aria_pagecache_buffer_size parameters, are 128 MB by default in MariaDB 10.3 (more discussion about these sizes below). I tried shrinking both caches to 8 MB (the default value in MySQL since at least 5.5), but I got another surprise:
> SET GLOBAL key_buffer_size = 8388608;
Query OK, 0 rows affected (0.001 sec)
> SET GLOBAL aria_pagecache_buffer_size = 8388608;
ERROR 1238 (HY000): Variable ‘aria_pagecache_buffer_size’ is a read only variable
The aria_pagecache_buffer_size parameter is not dynamic ! This is annoying as I like tuning parameters to be dynamic, so I opened MDEV-16606: Makearia_pagecache_buffer_sizedynamic for that. I tested with only shrinking the MyISAM cache and by modifying the startup configuration for Aria. The results for the core dump sizes are the following:

527 MB core dump for the default behavior,
400 MB core dump by shrinking the MyISAM cache from 128 MB to 8 MB,
268 MB core dump by also shrinking the Aria cache from 128 MB to 8 MB.

We are now at a core dump size smaller than MySQL 5.7.22: this is the result I was expecting.
I did some more tests with a larger InnoDB Buffer Pool and with a larger InnoDB Redo Log buffer while keeping MyISAM and Aria cache sizes to 8 MB. Here are the results of the sizes of the compact core dump (default behavior) vs the full core dump (using gdb):

340 MB vs 1.4 GB core dumps when growing the InnoDB Buffer Pool from 128 MB to 1 GB,
357 MB vs 1.7 GB core dumps when also growing the InnoDB Redo Log buffer from 16 MB to 128 MB.

I think the results above show the usefulness of the no InnoDB Buffer Pool in core dump feature.

Potential Improvements of the Shrinking Core Dump Feature

The end goal of excluding the InnoDB Buffer Pool from core dumps is to make generating and working with those files easier. As already mentioned above, the space and time taken to save core dumps are the main obstacles, and sharing them is also an issue (including leaking a lot of user data).
Ideally, I would like to always run MySQL/MariaDB with core dump enabled on crashes (I see one exception when using database-level encryption for not leaking data). I even think this should be the default behavior, but this is another discussion that I will not start here. My main motivation is that if/when MySQL crashes, I want all information needed to understand the crash (and eventually report a bug) without having to change parameters, restart the database, and generate the same crash again. Obviously, this configuration is unsuitable for servers with a lot of RAM and with a large InnoDB Buffer Pool. MariaDB 10.3 makes a big step forward by excluding the InnoDB Buffer Pool (and Redo Log buffer) from core dumps, but what else could be done to achieve the goal of always running MySQL with core dump enabled ?
There is a pull request to exclude the query cache from core dumps (also by Daniel Black, thanks for this work). When MariaDB is run with a large query cache (and I know this is unusual, but if you know of a valid real world use case, please add a comment below), excluding it from core dumps is good. But I am not sure this is a generally needed improvement:

MySQL 8.0 has retired the query cache,
the query cache is disabled by default from MariaDB 10.1.7,
and the default value for the query cache size was zero before MariaDB 10.1.7.

It looks like there is a consensus that the query cache is a very niche feature and otherwise should be disabled, so this work might not be the one that will profit most people. Still good to be done though.
I would like similar work to be done on MyISAM, Aria, TokuDB and MyRocks. As we saw above, there is an opportunity, for default deployments, to remove 256 MB from core dumps by excluding MyISAM and Aria caches. I think this work is particularly important for those two storage engines as they are loaded by default in MariaDB. By the way, and considering the relatively low usage of the MyISAM and Aria storage engine, maybe the default value for their caches should be lower: I opened MDEV-16607: Consider smaller defaults for MyISAM and Aria cache sizes for that.
I cannot think of any other large memory buffers that I would like to exclude from core dumps. If you think about one, please add a comment below.
Finally, I would like the shrinking core dump feature to also appear in Oracle MySQL and Percona Server, so I opened Bug#91455: Implement core dump size reduction for that. For the anecdote, I was recently working on a Percona Server crash in production, and we were reluctant to enable core dumps because of the additional minutes of downtime needed to write the file to disk. In this case, the no InnoDB Buffer Pool in core dump would have been very useful !
The post A Nice Feature in MariaDB 10.3: no InnoDB Buffer Pool in Core Dumps appeared first on Percona Community Blog.

Some bugs and spring pilgrimage to Percona Live Santa Clara 2018

I am now in an airport, waiting for one of the four flights that will bring me to Percona Live Santa Clara 2018.  This is a good time to write some details about my tutorial on parallel replication.  But before talking about Percona Live, I will share thoughts on MySQL/MariaDB bugs that caught my attention in the last weeks/months (Valeriy: you clearly have an influence on me).

MySQL/MariaDB

Next week in Brussels: Parallel Replication at the MySQL Pre-FOSDEM Day

FOSDEM is next weekend and I am talking about Parallel Replication on Friday, February 2nd at the MySQL Pre-FOSDEM Day (there might be tickets left in case of cancellation, attendance is free of charge).  During this talk, I will show benchmark results of MySQL 8.0 parallel replication on Booking.com real production environments.  I thought I could share a few things before the talk so here it

More Write Set in MySQL: Group Replication Certification

This is the third post in the series on Write Set in MySQL.  In the first post, we explore how Write Set allows to get better parallel replication in MySQL 8.0.  In the second post, we saw how the MySQL 8.0 improvement is an extension of the work done in MySQL 5.7 to avoid replication delay/lag in Group Replication.  In this post, we will see how Write Set is used in Group Replication to detect

Write Set in MySQL 5.7: Group Replication

In my previous post, I write that Write Set is not only in MySQL 8.0 but also in MySQL 5.7 though a little hidden.  In this post, I describe Write Set in 5.7 and this will bring us in the inner-working of Group Replication.  I am also using this opportunity to explain and show why members of a group can replicate faster than a standard slave.  We will also see the impacts, on Group Replication,

An update on Write Set (parallel replication) bug fix in MySQL 8.0

In my MySQL Parallel Replication session at Percona Live Santa Clara 2017, I talked about a bug in Write Set tracking for parallel replication (Bug#86078).  At the time, I did not fully understand what was going wrong but since then, we (Engineers at Oracle and me) understood what happened and the bug is supposed to be fixed in MySQL 8.0.4.  This journey thought me interesting MySQL behavior and

A crashing bug in MySQL: the CREATE TABLE of death (more fun with InnoDB Persistent Statistics)

I ended one of my last posts – Fun with InnoDB Persistent Statistics – with a cryptic sentence: there is more to say about this but I will stop here for now.  What I did not share at the time is the existence of a crashing bug somehow related to what I found.  But let’s start with some context.

In Bug#86926, I found a way to put more than 64 characters in the field table_name of the

The danger of no Primary Key when replicating in RBR (and a partial protection with MariaDB 10.1)

TL;DR: unless you know what you are doing, you should always have a primary key on your tables when replicating in RBR (and maybe even all the time).

TL;DR2: MariaDB 10.1 has an interesting way to protect against missing a primary key (innodb_force_primary_key) but it could be improved.

A few weeks ago, I was called off hours because replication delay on all the slaves from a replication chain

More Details about InnoDB Compression Levels (innodb_compression_level)

In one of my previous posts, I shared InnoDB table compression statistics for a read-only dataset using the default value of innodb_compression_level (6).  In it, I claimed, without giving much detail, that using the maximum value for the compression level (9) would not make a big difference.  In this post, I will share more details about this claim.

TL;DR: tuning innodb_compression_level is not

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569