fosdem

This Week in Data with Colin Charles 27: Percona Live Tutorials Released and a Comprehensive Review of the FOSDEM MySQL DevRoom

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.
Percona Live Santa Clara 2018 update: tutorials have been announced. The committee rated over 300+ talks, and easily 70% of the schedule should go live next week as well. In practice, then, you should see about 50 talks announced next week. There’s been great competition: we only have 70 slots in total, so about 1 in 5 talks get picked — talk about a competitive ratio.
FOSDEM
FOSDEM was truly awesome last week. From a Percona standpoint, we had a lot of excellent booth traffic (being outside of the PostgreSQL room on Saturday, and not too far out from the MySQL room on Sunday). We gave away bottle openers — useful in Brussels with all the beer; we tried a new design with a magnet to attach it to your fridge — stickers, some brochures, but most of all we had plenty of great conversations. There was quite a crowd from Percona, and it was excellent to see the MySQL & Friends DevRoom almost constantly full! A few of us brave souls managed to stay there the whole day, barely with any breaks, so as to enjoy all the talks.
I find the quality of talks to be extremely high. And when it comes to a community run event, with all content picked by an independent program committee, FOSDEM really sets the bar high. There is plenty of competition to get a good talk in, and I enjoyed everything we picked (yes, I was on the committee too). We’ve had plenty of events in the ecosystem that sort of had “MySQL” or related days, but FOSDEM might be the only one that has really survived. I understand we will have a day of some sort at SCALE16x, but even that has been scaled down. So if you care about the MySQL ecosystem, you will really want to ensure that you are at FOSDEM next year.
This year, we started with the usual MySQL Day on Friday. I could not be present, as I was at the CentOS Dojo, giving a presentation. So, the highlight of Friday for me? The community dinner. Over 80 people showed up, I know there was a waiting list, and lots of people were trying to get tickets at the last minute. Many missed out too; sorry, better luck next year; and also, hopefully, we will get a larger venue going forward. I really thank the organizers for this — we affectionately refer to them as the Belconians (i.e. a handful of Perconians based in Belgium). The conversation, the food, the drink —  they were all excellent. It’s good to see representation from all parts of the community: MySQL, Percona, MariaDB, Pythian, and others. So thank you again, Liz, Dimitri, Tom, and Kenny in absentia. I think Tjerk also deserves special mention for always helping (this year with the drinks)
As for FOSDEM itself, beyond the booth, I think the most interesting stuff was the talks. There are video recordings and slides of pretty much all talks, but I will also give you the “Cliff’s Notes” of them here.
MySQL DevRoom talk quick summaries
Beyond WHERE and GROUP BY – Sergei Golubchik

EXCEPT is in MariaDB Server 10.3

recursive CTEs are good for hierarchical data, graphs, data generation, Turing complete (you can use it to solve Sudoku even)
non-recursive CTEs can be an alternative syntax for subqueries in the FROM clause

Window functions:

Normal: one result per row, depend on that row only
Aggregate: one result per group, depending on the whole group
Window: one result per row, depending on the whole group

System versioned tables with AS OF
Aggregate stored functions

MySQL 8.0 Performance: InnoDB Re-Design – Dimitri Kravtchuk

Contention-Aware Transactions Scheduling (CATS), since 8.0.3. Not all transactions are equal, FIFO could not be optimal, unblock the most blocking transactions first
CATS (VATS) had a few issues, and there were bugs (they thought everything worked since MariaDB Server had implemented it). They spent about 9 months before fixing everything.
Where does CATS help? Workloads hitting row lock contentions. You can monitor via SHOW ENGINE INNODB MUTEX.
the main problem is because of repeatable read versus read committed transaction isolation on the same workload. You really need to understand your workload when it comes to VATS.

MySQL 8.0 Roles – Giuseppe Maxia

Created like a user, granted like privileges. You need to activate them to use them.
Before roles, you created a user, then grant, grant, and more grant’s… Add another user? Same deal. Lots of repetitive work and a lot of chances to make mistakes.
Faster user administration – define a role, assign it many times. Centralized grant handling – grant and revoke privileges to roles, add/edit all user profiles.
You need to remember to set the default role.
A user can have many roles; default role can be a list of roles.
Roles are users without a login – roles are saved in user tables. This is useful from an account lock/unlock perspective.
You can grant a user to a user
SET ROLE is for session management; SET DEFAULT ROLE is a permanent assignment of a role for a user. SET ROLE DEFAULT means assign the default role for this user for this session

The role_edges table reports which roles are assigned to which users. default_roles keeps track of the current default roles assigned to users. A default role may not exist.

Histogram support in MySQL 8.0 – Øystein Grøvlen

You can now do ANALYZE TABLE table UPDATE HISTOGRAM on column WITH n BUCKETS;
New storage engine API for sampling (default implementation is full table scan even when sampling)
Histogram is stored in a JSON column in the data dictionary. Grab this from the INFORMATION_SCHEMA.
Histograms are useful for columns that are not the first column of any index, and used in WHERE conditions of JOIN queries, queries with IN-subqueries, ORDER BY … LIMIT queries. Best fit: low cardinality columns (e.g. gender, orderStatus, dayOfWeek, enums), columns with uneven distribution (skew), stable distribution (do not change much over time)
How many buckets? equi-height, 100 buckets should be enough.
Histograms are stored in the data dictionary, so will persist over restarts of course.

Let’s talk database optimizers – Vicențiu Ciorbaru

Goal: produce a query plan that executes your query in the fastest time possible.
Condition pushdown through PARTITION BY, there is also a comparison with PostgreSQL
Split grouping for derived optimization only in MariaDB 10.3

‪Read more at splitgroupingderived=on — https://github.com/MariaDB/server/commit/b14e2b044b (and the equivalent MDEV: https://jira.mariadb.org/browse/MDEV-13369)‬

TLS for MySQL at Large Scale – Jaime Crespo

Literally took 3 lines in the my.cnf to turn on TLS
https://dbtree.wikimedia.org
They wanted to do a data centre failover and wanted to ensure replication would be encrypted.
They didn’t have proper orchestration in place (MySQL could have this too). Every time OpenSSL or MySQL had to be upgraded, the daemon needed restarting. If there was an incompatible change, you had to sync master/replicas too.

The automation and orchestration that Wikipedia uses: https://fosdem.org/2018/schedule/event/cumin_automation/ (it is called Cumin: https://wikitech.wikimedia.org/wiki/Cumin)

Server support was poor – OpenSSL – so they had to deploy wmf-mysql and wmf-mariadb of their own
Currently using MariaDB 10.0, and looking to migrate to MariaDB 10.1
Client library pain they’ve had
TLSv1.2 from the beginning (2015).
20-50x slower for actual connecting; the impact is less than 5% for the actual query performance. Just fix client libraries, make them use persistent connections. They are now very interested in ProxySQL for this purpose.
https://grafana.wikimedia.org/?orgId=1
Monty asks, would a double certificate help? Jaime says sure. But he may not actually use double certificates; might not solve CA issues, and the goal is not to restart the server.
Monty wonders why not to upgrade to 10.2? “Let’s talk outside because it’s a much larger question.”

MySQL InnoDB Cluster – Miguel Araújo

group replication: update everywhere (multi-master), virtually synchronous replication, automatic server failover, distributed recovery, group reconfiguration, GCS (implementation of Paxos – group communication system). HA is a critical factor.
mysqlsh: interactive and batch operations. Document store (CRUD and relational access)
admin API in mysqlsh: create & manage clusters, hide complexity of configuration/provisioning/orchestration of the InnoDB clusters. Works with JavaScript and Python
Usability. HA out of the box.
It’s easy to join a new node; new node goes into recovery mode (and as long as you have all the binary logs, this is easy; otherwise start from a backup)
SET PERSIST – run a command remotely, and the configuration is persisted in the server
Network flapping? Group replication will just reject the node from the cluster if its flapping too often

Why we’re excited about MySQL 8 – Peter Zaitsev

Native data dictionary – atomic, crash safe, DDLs, no more MyISAM system table requirements
Fast INFORMATION_SCHEMA
utf8mb4 as default character set
Security: roles, breakdown of SUPER privileges, password history, faster cached-SHA2 authentication (default), builds using OpenSSL (like Percona Server), skip grants blocks remote connections, logs now encrypted when tablespace encryption enabled
Persistent AUTO_INCREMENT
auto-managed undo tablespaces – do not use system table space for undo space. Automatically reclaim space on disks.
Self-tuning, limited to InnoDB (innodb_dedicated_server to auto-tune)
partial in-place update for JSON – update filed in JSON object without full rewrite. Good for counters/statuses/timestamps. Update/removal of element is supported
Invisible indexes – test impact of dropping indexes before actually dropping them. Maintained but unused by the optimizer. If not needed or used, then drop away.
TmpTable Storage Engine – more efficient storage engine for internal temporary tables. Efficient storage for VARCHAR and VARBINARY columns. Good for GROUP BY queries. Doesn’t support BLOB/TEXT columns yet (this reverts to InnoDB temp table now)
Backup locks – prevent operations which may result in inconsistent backups. CHECK INSTANCE FOR BACKUP (something Percona Server has had before)
Optimizer histograms – detailed statistics on columns, not just indexes

improved cost model for the optimizer – www.unofficialmysqlguide.com

Performance schematic – faster (via “fake” indexes), error instrumentation, response time histograms (global & per query), digest summaries
select * from sys.session – fast potential replacement for show processlist
RESTART (command)
SET PERSIST – e.g. change the buffer pool size, and this helps during a restart
assumes default storage is SSD now
binary log on by default, log_slave_updates enabled by default, and log expires after 30 days by default
query cache removed. Look at ProxySQL or some other caching solution
native partitioning only – remove partitions from MyISAM or convert to InnoDB
resource groups – isolation and better performance (map queries to specific CPU cores; can jail your costly queries, like analytical queries)
Feature Requests: better single thread performance, no parallel query support

MySQL Test Framework for Support and Bugs Work – Sveta Smirnova

MTR allows you to add multiple connections
has commands for flow control

ProxySQL – GTID Consistent Reads – René Cannaò, Nick Vyzas

threshold is configurable in increments of 1 second. Replication lag can be monitored with ProxySQL. Want to ensure you don’t have stale reads.
Why is GTID important? To guarantee consistently. Auto positioning for restructuring topologies.
–session-track-gtids is an important feature which allows sending the GTID for a transaction on the OK packet for a transaction. Not available in MariaDB.
There is a ProxySQL Binlog Reader now – GTID information about a MySQL server to all connected ProxySQL instances. Lightweight process to run on your MySQL server.
ProxySQL can be configured to enforce GTID consistency for reads on any hostgroup/replication hostgroup.
Live demo by René

Turbocharging MySQL with Vitess – Sugu Sougoumarane

trend for the cloud: container instances, short-lived containers, tolerate neighbors, discoverability. No good tools yet for Kubernetes.
non-ideal options: application sharing, NoSQL, paid solutions, NewSQL (CockroachDB, TiDB, Yugabyte)
Vitess: leverage MySQL at massive scale, opensource, 8+ years of work, and multiple production examples
Square uses Vitess for Square Cash application.
Can MySQL run on Docker? Absolutely, many of the companies do huge QPS on Docker.
YouTube does a major re-shard every 2-3 months once. No one notices nowadays when that happens.
app server connects to vtgate, and only underneath it’s a bunch of smaller databases with vttablet + mysqld. The lockserver is what makes it run well in the cloud.
pluggable architecture with no compromise on performance: monitoring, health check, ACLs, tracing, more.
at most, it adds about 2ms overhead to connections
Go coding standards are enforced, unit tests with strict coverage requirements, end-to-end tests, Travis, CodeClimate and Netlify. Readability is king.

On February 5 2018, it will be a CNCF project. One year of due diligence. They said there was nothing to compare it with. Looked at maturity and contributors. It’s becoming a truly community-owned project! (CNCF to Host Vitess is already live as of now)

roadmap: full cross-shard queries, migration tools, simplify configurability, documentation.
full MySQL protocol, but a limited query set – they want to get it to a point where it accepts a full MySQL query.

Orchestrator on Raft – Shlomi Noach

Raft: guaranteed to be in-order replication log, an increasing index. This is how nodes choose a leader based on who has the higher index. Get periodic snapshots (node runs a full backup).
HashiCorp raft, a Golang raft implementation, used by Consul
orchestrator manages topology for HA topologies; also want orchestrator to be highly available. Now with orchestrator/raft, remove the MySQL backend dependency, and you can have data center fencing too. Now you get: better cross-DC deploys, DC-local KV control, and also Kubernetes friendly.
n-orchestrator nodes, each node still runs its own backend (either MySQL or SQLite). Orchestrator provides the communication for SQLite between the nodes. Only one (the Raft leader) will handle failovers
implementation & deployment @ Github – one node per DC (deployed at 3 different DCs). 1-second raft polling interval. 2 major DCs, one in the cloud. Step-down, raft-yield, SQLite-backed log store, and still a MySQL backend (SQLite backend use case is in the works)
They patched the HashiCorp raft library. The library doesn’t care about the identity of nodes, with Github they do want to control the identity of the leader. There is an “active” data center, and locality is important. This is what they mean by raft-yield (picking a candidate leader).
The ability for a leader to step down is also something they had to patch.
HashiCorp Raft only supports LMDB and another database, so the replication log is now kept in a relational SQLite backed log store. Another patch.
once orchestrator can’t run its own self-health check, it recognizes this. The application can tell raft now that it’s stepping down. Takes 5 seconds to step down, and raft then promotes another orchestrator node to be the leader. This is their patch.
can also grab leadership
DC fencing handles network partitioning.
orchestrator is Consul-aware. Upon failover, orchestrator updates Consul KV with the identity of the promoted master.
considerations to watch out for: what happens if, upon replay of the Raft log, you hit two failovers for the same cluster? NOW() and otherwise time-based assumptions. Reapplying snapshot/log upon startup
roadmap: use Kubernetes (cluster IP based configuration in progress, already container friendly via auto-re-provisioning of nodes via Raft)

MyRocks Roadmaps – Yoshinori Matsunobu

Facebook has a large User Database (UDB). Social graph, massively sharded, low latency, automated operations, pure flash storage (constrained by space, not CPU/IOPS)
They have a record cache in-front of MySQL – Tao for reads. If cache misses, then it hits the database. And all write requests go thru MySQL. UDB has to be fast to ensure a good user experience.
they also at Facebook run 2 instances of MySQL on the same machine, because CPU wasn’t huge, but the space savings were awesome.
design decisions: clustered index (same as InnoDB), slower for reads, faster for writes (bloom filters, column family), support for transactions including consistency between binlog and MyRocks. Faster data loading/deletes/replication, dynamic options (instead of having to restart mysqld), TTL (comparable to HBase TTL feature, specify the TTL, any data older than time, can be removed), online logical (for recovery purposes) & binary backup (for creating replicas)
Pros: smaller space, better cache hit rate, writes are faster so you get faster replication, much smaller bytes written
Cons: no statement based replication, GAP locks, foreign keys, full-text index, spatial index support. Need to use case sensitive collations for performance. Reads are slower, especially if the data fits in memory. Dependent on file system and OS; lack of solid direct I/O (uses buffered I/O). You need a newer than 4.6 kernel. Too many tuning options beyond buffer pool such as bloom filter, compactions, etc.
https://twitter.com/deniszh/status/960163082642382849
Completed InnoDB to MyRocks migration. Saved 50% space in UDB compared to compressed InnoDB.
Roadmaps: getting in MariaDB and Percona Server for MySQL. Read Mark’s blog for matching read performance vs InnoDB. Supporting mixed engines. Better replication and bigger instance sizes.
mixed engines: InnoDB and MyRocks on the same instance, though single transaction does not overlap engines. Plan to extend star backup to integrate `myrocks_hotbackup. Backport gtid_pos_auto_engines from MariaDB?
Removing engine log. Could be caused by binlog and engine log, which requires 2pc and ordered commits. Use one log? Either binlog or binlog like service or RocksDB WAL? Rely on binlog now (semi-sync, binlog consumers), need to determine how much performance is gained by stopping writing to WAL.
Parallel replication apply is important in MySQL 8
support bigger instance sizes: shared nothing database is not a general purpose database. Today you can get 256GB+ RAM and 10TB+ flash on commodity servers. Why not run one big instance and put everything there? Bigger instances may help general purpose small-mid applications. Then you don’t have to worry about sharing. Atomic transactions, joins and secondary keys will just work. Amazon Aurora today supports a 60TB instance!
today: you can start deploying slaves with consistency check. Many status counters for instance monitoring.

ProxySQL internals – René Cannaò

reduce latency, scales, maximize throughput. Single instance to travel hundreds of thousands of connections and to handle thousands of backend servers.
threading models: one thread per connection (blocking I/O), thread pooling (non-blocking I/O, scalable).
ProxySQL thread pool implementation: known as “MySQL threads”, fixed number of worker threads (configurable), all threads listen on the same port(s), client connections are not shared between threads, all threads perform their own network I/O, and it uses poll() (does that scale? True, but there is a reason why poll over epoll)
threads never share client connections – no need for synchronization, thread contention is reduced, each thread calls poll(). Possibly imbalanced load as a con (one thread that has way more connections that another). Is it really a problem? Most of the time, no, connections will automatically balance.
poll() is O(N), epoll() is O(1). Poll() is faster than epoll() for fewer connections (around 1000). Performance degrees when there are a lot of connections. So by default, it uses poll() instead of epoll(), around 50,000 connections performance degrades badly – so ProxySQL has auxiliary threads.
MySQL_Session() is implemented as a state machine. Stores metadata associated with the client session (running timers, default hostgroup, etc.)

MySQL Point-in-time recovery like a rockstar – Frederic Descamps

to do PITR, you need a decent backup!
manual tells you about PITR – read it well
https://bugs.mysql.com/bug.php?id=89375
github.com/lefred/MyUndelete

Releases

Percona Monitoring and Management 1.7.0 (PMM) – This release features improved support for external services, which enables a PMM Server to store and display metrics for any available Prometheus exporter. For example, you could deploy the postgres_exporter and use PMM’s external services feature to store PostgreSQL metrics in PMM. Immediately, you’ll see these new metrics in the Advanced Data Exploration dashboard. Then you could leverage many of the pre-developed PostgreSQL dashboards available on Grafana.com, and with a minimal amount of edits have a working PostgreSQL dashboard in PMM!

MariaDB Server 10.1.31 – usual updates to storage engines, and a handful of bug fixes.

Link List

Amazon Redshift Spectrum – are you using this yet?

Database sharding explained in plain English

Modern SQL Window Function Questions – a quiz that’s quite poignant (MariaDB 10.2+ has window functions; upcoming MySQL 8 has it too).

Upcoming appearances

SCALE16x – Pasadena, California, USA – March 8-11 2018

Feedback
I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

This Week in Data with Colin Charles 26: Percona Live Schedule is Near Completion, FOSDEM Underway and a Percona Toolkit Use Case

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community. Percona Live Santa Clara 2018 update: tutorials have been picked, and the schedule/press release should be announced by next week. We’ve (the committee) rated over 300+ talks, and easily 70% of the schedule should go […]

On InnoDB’s Online DDL

I am completing my preparations for the upcoming FOSDEM talk, and one of last things I wanted to do in frames of them is a quick review of known bugs and problems in current (as in MySQL 5.7 GA) implementation of so called “online” DDL for InnoDB tables.In my previous post I already shared my view on another important InnoDB feature, persistent statistics. Unlike that, I do not really hate online DDL. I just try to avoid it if possible and use tools like pt-online-schema-change or gh-ost instead. Not because it is not documented properly (the documentation is quite detailed, there are still things to clarify though) or does not work as designed, but mostly because the term “online” (if we understand it as “not blocking”, or “without blocking/affecting the application and read/write operations to the table being changed is available”) is a bit misleading (it is more like “less blocking” or “blocking for shorter periods of time”, faster and in-place, sometimes), and because it does not work the way one might expect in any kind of replication setups.To be more specific:
Replication ignores LOCK=NONE 🙂 Slave will only start to apply “concurrent” DML after commit, and this leads to a huge replication lag.
In too many cases the entire table is rebuilt (data are (re-)written), in place or by creating a copy, while notable writes in the process of running ALTER TABLE are really required only if we are introducing stricter constraints (and even in this case we can just validate the table, return error if some row does not satisfy new constraint, too long to fit, for example, and then change metadata if all rows are OK) or adding new indexes (that in any case can not be used until they are built).
The online log has to be kept (in memory or in temporary file). There is one such log file for each index being created or table being altered. Manual says:”This log file stores data inserted, updated, or deleted in the table during the DDL operation. The temporary log file is extended when needed by the value of innodb_sort_buffer_size, up to the maximum specified by innodb_online_alter_log_max_size. If a temporary log file exceeds the upper size limit, the ALTER TABLE operation fails and all uncommitted concurrent DML operations are rolled back. Thus, a large value for this option allows more DML to happen during an online DDL operation, but also extends the period of time at the end of the DDL operation when the table is locked to apply the data from the log.”The problem is that the size depends on the concurrent DML workload and is hard to predict. Note also “when the table is locked” above to understand how much “online” is this…

There are also bugs, and I’d like to discuss some of them:

Bug #82997, “Online DDL fails with”. There are not enough public details to be sure with what exactly, but maybe the problems (several are reported) happen when the table altered has generated column. if this is really so, the bug may be fixed in MySQL 5.7.19+.

Bug #73196, “Allow ALTER TABLE to run concurrently on master and slave”. I can not put this better than Andrew Morgan did it in this verified feature request:”With online ALTER TABLE it is possible for the DDL operation to run for many hours while still processing DML on that same table. The ALTER TABLE is not started on the slave until after it has completed on the master and it will again take many hours to run on the slave. While the DDL runs on the slave, it is not possible for it to process any transactions which followed the ALTER TABLE on the master as they may be dependent on the changes that were made to the table’s schema. This means that the slave will lag the master by many hours while the ALTER TABLE runs and then while it catches up on the backlog of DML sent from the master while that was happening.”Both pt-osc and gh-ost resolve this problem, as they take replication topology into account and can throttle changes on master if needed. See also this documentation request by Daniël van Eeden, Bug #77619 , that lists more limitations of “online” DDL, and check how it helped to clarify them here.

Bug #67286, “InnoDB Online DDL hangs”. It ended up as “Not a bug”, but there is a good explanation of exclusive metadata lock set by the “online” ALTER in the comments:”The final (short) phase of ALTER where the internal data dictionary is updated requires exclusive access. That’s why the ALTER was blocked by the active transaction having a shared lock on the table.”I once studied similar (and even simpler) case in a lot of details with gdb, see this blog post. I’ve clearly see MDL_EXCLUSIVE lock request for simple ALTER TABLE … STATS_AUTO_RECALC=1 that (according to the manual) “permits concurrent DML”. Other manual page clarifies:”In most cases, an online DDL operation on a table waits for currently executing transactions that are accessing the table to commit or roll back because it requires exclusive access to the table for a brief period while the DDL statement is being prepared. Likewise, the online DDL operation requires exclusive access to the table for a brief time before finishing. Thus, an online DDL statement also waits for transactions that are started while the DDL is in progress to commit or roll back before completing.”Dear MySQL Oracle developers, just remove “In most cases” (or clarify it), and this would be fair enough!

Bug #84004, “Manual misses details on MDL locks set and released for online ALTER TABLE”. That’s my documentation request I filed after spending some time tracing metadata locks usage in gdb. My request is simple (typos corrected):”Describe all kinds of metadata locks used by MySQL, their interactions and order of acquisition and release for most important SQL statements, including (but not limited to) all kinds of online ALTER TABLE statements for InnoDB tables.”

Bug #68498, “can online ddl for innodb be more online?”. This report by Mark Callaghan that refers to this detailed study is still “Verified”. Based on the comments to that blog post, it is “enough online”, but the details of implementation were not clearly documented at the moment. Check for the details and clarifications in the comments!

Bug #72109, “Avoid table rebuild when adding or removing of auto_increment settings”. The bug report from Simon Mudd is still “Verified”.

Bug #57583, “fast index create not used during “alter table foo engine=innodb””. The bug is still “Verified” and I can not tell from the manual if this is implemented in MySQL 5.7 or not.

Bug #83557, “Can’t use LOCK=NONE to drop columns in table with virtual columns” – nice “Verified” bug report by Monty Solomon.

Bug #70790, “ALTER TABLE REBUILD PARTITION SHOULD NOT PREVENT DML IN UNAFFECTED PARTITIONS”. My former colleague in Oracle Arnaud Adant simply asked to provide proper and reasonable support of online DDL for partitioned tables. This bug report is still “Verified”, but at least we have a separate manual page now that explains the details and limitations of online DDL with partitioned tables (most of Arnaud’s requests are still NOT implemented).

Bug #81819, “ALTER TABLE…LOCK=NONE is not allowed when FULLTEXT INDEX exists”. As Marko Mäkelä explains in the last comment of this “Verified” feature request:”However, LOCK=NONE is never supported when a FULLTEXT INDEX exists on the table. Similarly, LOCK=NONE is not supported when SPATIAL INDEX (introduced in MySQL 5.7) exist. Speaking as the author of WL#6255 which implemented ALTER TABLE…LOCK=NONE for InnoDB B-tree indexes in MySQL 5.6, I share the bug reporter’s disappointment.”

To summarize, online DDL in MySQL 5.7 is surely far above and beyond “fast index creation”, but there is still a lot of room from improvements. Real solution (that allows to perform ALTER TABLE fast and without unnecessary changes/writes to data in way more cases) may come with real data dictionary in MySQL and support for multiple table versions there, or from ideas like those implemented in MDEV-11369, “Instant add column for InnoDB”, and expressed in MDEV-11424, “Instant ALTER TABLE of failure-free record format changes”. Until that all is implemented I’d prefer to rely on good old tools like pt-oscIn any case we speak about backward incompatible changes to the way MySQL works and stores data now.

Next week in Brussels: Parallel Replication at the MySQL Pre-FOSDEM Day

FOSDEM is next weekend and I am talking about Parallel Replication on Friday, February 2nd at the MySQL Pre-FOSDEM Day (there might be tickets left in case of cancellation, attendance is free of charge).  During this talk, I will show benchmark results of MySQL 8.0 parallel replication on Booking.com real production environments.  I thought I could share a few things before the talk so here it

This Week in Data with Colin Charles 25: Meltdown/Spectre still dominate, FOSDEM approaches and Timescale gets funding

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.
Still on Meltdown/Spectre, this time MariaDB Corporation has published Meltdown Vulnerability Impact On MariaDB Server – interesting the comparison between glibc/tcmalloc. Worthy Facebook thread about this too, with a bit of chat about MongoDB performance. Officially MongoDB says a degradation of 10-15%. ScaleGrid has a good post, in which they test MongoDB against AWS/Azure/DigitalOcean. It’s interesting that on AWS they see a 4-5% hit on insert workloads, but on Azure its 10-20% and on DigitalOcean its 30%! (The comparison for the balanced workload is that on AWS its 2-3%, 20-205% on Azure, and 30% on DigitalOcean).
We’ve chosen tutorials for Percona Live Santa Clara, and will be announcing them soon; and by the first week of February most talks as well. Don’t forget you have the option to still get the best price at registration now.
Don’t forget that in the next week, we have FOSDEM — many Perconians will be there and look forward to meeting with you. We have a community table. Peter Zaitsev has a talk in the main track. Don’t forget the fringe events: on 2nd February there is a pre-FOSDEM MySQL Day, the MySQL & Friends Community Dinner, and also a CentOS Dojo that I will be speaking at. Looking forward to a packed & fun-filled Friday, Saturday and Sunday!
It’s worth reading this: Timescale raises $16M from Benchmark, NEA, Two Sigma Ventures for the database to help us understand our machines and our world. Timescale has been an active participant in the Percona Live events, and this round of funding bodes very well for the PostgreSQL database world.
Releases

MySQL 8.0.4 Release Candidate – this is a big deal, highly recommended for testing; plenty of blog posts on the MySQL Server Team blog as well as in this very Percona blog. Read the release notes as well.

Percona Monitoring and Management (PMM) 1.6.1 – fresh off the heels of 1.6.0, more new features for PMM
Percona XtraDB Cluster 5.6.38-26.23
MariaDB 5.5.59

Link List

Citus and pg_partman: Creating a scalable time series database on Postgres – Postgres 10 brings you native time partitioning, and with the Citus extension you can create tables distributed across nodes by ID and partitioned by time.
antirez tells us he’s spending a lot of time working on Redis Streams

Cut down database imports by a third using this one weird trick – an open source PostgreSQL bulk loader.

Building a Distributed Log from Scratch, Part 5: Sketching a New System – its a many part series, all interesting if you have the time to read it.

AWS Aurora Postgres, not a great first impression – I think the best part is the comments, where the product manager chimes in, about shared_buffers, and the fact that they are working on an Aurora PostgreSQL best practices document.

Enhanced replication monitoring in ProxySQL 1.4.4 – in where ProxySQL works well with pt-heartbeat.

Stream all the things – Nice video, focusing on streaming architectures for data sets that never end.

Upcoming appearances

CentOS Dojo Brussels – Brussels, Belgium – February 2 2018

FOSDEM 2018 – Brussels, Belgium – February 3-4 2018

SCALE16x – Pasadena, California, USA – March 8-11 2018

Feedback
I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

This Week in Data with Colin Charles 20: cPanel changes strategy, Percona Live CFP extended

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.
I think the biggest news from last week was from cPanel – if you haven’t already read the post, please do – on Being a Good Open Source Community Member: Why we hesitated on MySQL 5.7. cPanel anticipated MariaDB being the eventual replacement for MySQL, based on movements from Red Hat, Wikipedia and Google. The advantage focused on transparency around security disclosure, and the added features/improvements. Today though, “MySQL now consistently matches or outpaces MariaDB when it comes to development and releases, which in turn is increasing the demand on us for providing those upgraded versions of MySQL by our users.” And maybe a little more telling, “when MariaDB 10.2 became stable in May 2017 it included many features found in MySQL 5.7. However, MySQL reached stable nearly 18 months earlier in October 2015.” (emphasis mine).
So cPanel is going forth and supporting MySQL 5.7. They will continue supporting MariaDB Server for the foreseeable future. This really is cPanel ensuring they are responsive to users: “The people using and building database-driven applications are doing so with MySQL in mind, and are hesitant to add support for MariaDB. Responding to our community’s desires is one of the most important things to us, and this is something that we are hearing asked for from our community consistently.”
I, of course, think this is a great move. Users deserve choice. And MySQL has features that are sometimes still not included in MariaDB Server. Have you seen the Complete list of new features in MySQL 5.7? Or my high-level response to a MariaDB Corporation white paper?
I can only hope to see more people think pragmatically like cPanel. Ubuntu as a Linux distribution still does – you get MySQL 5.7 as a default (very unlike the upstream Debian which ships MariaDB Server nowadays). I used to be a proponent of MariaDB Server being everywhere, when it was community-developed, feature-enhanced, and backward-compatible. However, the moment it stopped being a branch and a true fork is the moment where trouble lies for users. I think it was still marginally fine with 10.0, and maybe even 10.1, but the ability to maintain feature parity with enhanced features has long gone. Short of a rebase? But then… what would be different to the already popular branch of MySQL called Percona Server for MySQL?
While there are wins and support from cloud vendors, like Amazon AWS RDS and Microsoft Azure, you’ll notice that they offer both MySQL and MariaDB Server. Google Cloud SQL notably only offers MySQL. IBM may be a sponsor of the MariaDB Foundation, but I don’t see their services like Compose offering anything other than MySQL (with group replication nonetheless!). Platinum member Alibaba Cloud offers MySQL and PostgreSQL. However, Tencent seems to suggest that MariaDB is coming soon? One interesting statistic to watch would be user uptake naturally.
Events
From an events standpoint, the Percona Live 2018 Call for Papers has been extended to January 12, 2018. We expect an early announcement of maybe ten talks in the week of  January 5. Please submit to the CFP. Have you got your tickets yet? Nab them during our Percona Live 2018 super saver registration when they are the best price!
FOSDEM has got Sveta and myself speaking in the MySQL and Friends DevRoom, but we also have good news in the sense that Peter Zaitsev is also going to be at FOSDEM – speaking in the main track. We’ll also have plenty of schwag at the stand.
I think it’s important to take note of the updates to Percona bug tracking: yes, its Jira all the way. Would be good for everyone to start also looking at how the sausage is made.
Dragph, a “distributed fast graph database“, just raised $3m and released 1.0. Have you used it?
On a lighter note, there seems to be a tweet going around by many, so I thought I’d share it here. Merry Christmas and Happy Holidays.
He’s making a database
He’s sorting it twice
SELECT * FROM girls_boys WHERE behaviour = “nice”
SQL Claus is coming to town!
Releases

Percona Monitoring and Management 1.5.3

ProxySQL 1.4.4 – some interesting features include bandwidth throttling, limit connections to backends, monitoring replication lag, and more.
MariaDB Server 10.2 is in CentOS 6 and CentOS 7 via Software Collections.

Link List

Good presentation on multi-user Presto usage.
You probably must be dabbling with containers, so Google released container-diff, a tool for quickly comparing container images. Check it out on GitHub.
James Governor (RedMonk), writes On AWS and Pivotal, opinions and overlaps.
That time Larry Ellison allegedly tried to have a professor fired for benchmarking Oracle
Apache Cassandra users, should you use incremental repair?

Upcoming appearances

FOSDEM 2018 – Brussels, Belgium – February 3-4 2018

SCALE16x – Pasadena, California, USA – March 8-11 2018

Feedback
I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

This Week in Data with Colin Charles 19: Percona Live Tickets, Call for Papers and FOSDEM

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community. The Percona Live Call For Papers closes on December 22, but why aren’t you submitting already? Don’t wait till the last minute! Look at our broad scope as well. Worth noting that the best prices […]

This Week in Data with Colin Charles 16: FOSDEM, Percona Live call for papers, and ARM

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community. Hurry up – the call for papers (CFP) for FOSDEM 2018 ends December 1, 2017. I highly recommend submitting as its a really fun, free, and technically-oriented event. Don’t forget that the CFP for Percona […]

Fun with Bugs #57 – On MySQL Bug Reports I am Subscribed to, Part I

I’ve decided to stop reviewing MySQL Release Notes in this series, but it does not mean that I am not interested in MySQL bugs any more. At the moment I am subscribed to 91 active MySQL bugs reported by other MySQL users, and in this blog post I am going to present 15 of them, the most recently reported ones. I’d really want to see them fixed or at least properly processed as soon as possible.In some cases I am going to add my speculations on how the bug had better be handled, or maybe highlight some important details about it. It is not my job any more to process/”verify” any community bug reports for any kind of MySQL, but I did that for many years and I’ve spent more than 5 years “on the other side”, being a member of Community, so in some cases I let myself to share some strong opinion on what may be done differently from the Oracle side.As a side note, I started to subscribe to MySQL bugs mostly after I left Oracle, as before that I got email notification about each and every change in every MySQL bug report ever created…Here is the list, starting from the most recent ones:

Bug #88422 – “MySQL 5.7 innodb purge thread get oldest readview could block other transaction”. It is one of that bug reports without a test case from reporter. It is tempting to set it to “Verified” just “based on code review”, as the code in 5.7 is quite obviously shows both holding the trx_sys->mutex and linear complexity of search depending on number of read views in the worst case (when most of them are closed):/**Get the oldest (active) view in the system.@return oldest view if found or NULL */ReadView*MVCC::get_oldest_view() const{        ReadView*       view;        ut_ad(mutex_own(&trx_sys->mutex));        for (view = UT_LIST_GET_LAST(m_views);             view != NULL;             view = UT_LIST_GET_PREV(m_view_list, view)) {                if (!view->is_closed()) {                        break;                }        }        return(view);}But probably current Oracle bugs verification rules do not let to just mark it as verified. After all, somebody will have to create a test case… So, my dear old friend Sinisa Milivojevic decided to try to force bug reporter to provide a test case instead of spending some time trying to create one himself. I am not going to blame him for that, why to try the easy way 🙂 But I consider this his statement in the comment dated [10 Nov 16:21]:”… 5.7 methods holds no mutex what so ever…”a bit wrong, as we can see the mutex is acquired when get_oldest_view() method is called:voidMVCC::clone_oldest_view(ReadView* view){        mutex_enter(&trx_sys->mutex);        ReadView*       oldest_view = get_oldest_view();        if (oldest_view == NULL) {…

Bug #88381 – “Predicate cannot be pushed down “past” window function”. Here bug reporter had provided enough hints for a test case. One can probably just check ‘Handler%’ status variables before and after query execution to come to the conclusion. Moreover, it seems Oracle developer,  Dag Wanvik, accepted this as a known limitation, but the bug still remains “Open” and nobody knows if it was copied to the internal bugs database, got prioritized and if any work on this is planned any time soon. We shell see. You may also want to monitor MDEV-10855.

Bug #88373 – “Renaming a column breaks replication from 5.7 to 8.0 because of impl. collation”. This bug was quickly verified by Umesh Shastry. I expect a lot of “fun” for users upgrading to MySQL 8.0 when it becomes GA, especially in replication setups.

Bug #88328 – “Performance degradation with the slave_parallel_workers increase”. There is no test case, just some general description and ideas about the root case when semi-sync replication is used. I expect this bug to stay “Open” for a long time, as it is a topic for a good research and blog posts like this one, that is, a work for real expert!

Bug #88223 – “Replication with no tmpdir space and InnoDB as tmp_storage_engine can break”. Here we have clear and simple test case from Sveta Smirnova (no wonder, she also worked at bugs verification team in MySQL, Sun and Oracle). I hope Umesh will verify it soon. As a side note, it is explained (in the comments) elsewhere that InnoDB as internal_tmp_disk_storage_engine may not be the best possible option. We do not have this variable and do not plan to support InnoDB for internal temporary tables in MariaDB 10.2+.

Bug #88220 – “compressing and uncompressing InnoDB tables seems to be inconsistent”. See also other, older bug reports mentioned there that are duplicates/closely related, but were not getting proper attention.

Bug #88150 – “‘Undo log record is too big.’ error occurring in very narrow range of str length”. It was reported by my colleague Geoff Montee and is already fixed in recent versions of MariaDB (see MDEV-14051 for the details and some nice examples of gdb usage by a developer)!

Bug #88127 – “Index not used for ‘order by’ query with utf8mb4 character set”. Here I am just curious when bugs like that would be caught up by Oracle QA before any public releases.

Bug #88071 – “An arresting Performance degradation when set sort_buffer_size=32M”. here the test case is clear – just run sysbench oltp test at high concurrency with different values of sort_buffer_size. Still, Sinisa Milivojevic decided to explain when RAM limit may play a role instead of just showing how it works great (if it does) on any server with enough RAM… Let’s see how this attempt to force bug reporter to work/explain more may end up…

Bug #87947 – “Optimizer chooses ref over range when access when range access is faster”. Nice example of a case when optimizer trace may be really useful. Øystein Grøvlen kindly explained that “range access and ref access are not comparable costs”. I wish we get better cost model for such cases in MySQL one day.

Bug #87837 – “MySQL 8 does not start after upgrade to 8.03”. It is expected actually, and even somewhat documented in the release notes that MySQL 8.0.3 is not compatible to any older version. So, it is more like MySQL Installer (that I do not care much about) bug, but I still subscribed to it as yet another source of potential fun during further upgrade attempts.

Bug #87716 – “SELECT FOR UPDATE with BETWEEN AND gets row lock excessively”. I think I already studied once why with IN() rows are locked differently by InnoDB comparing to BETWEEN that selects the same rows. But I’d like to know what’s the Oracle’s take on this, and I’d like to study this specific test case in details one day as well.

Bug #87670 – “Force index for group by is not always honored”. Clear and simple test case, so no wonder it was immediately verified.

Bug #87621 – “Huge InnoDB slowdown when selecting strings without indexes “. I’d like to check with perf one day where the time is spent mostly during this test. For now I think this is a result of the way “long” data are stored on separate pages in InnoDB. What;’s interesting here is also a test case where R is used to generate data set.

Bug #87589 – “Documentation incorrectly states that LOAD DATA LOCAL INFILE does not use tmpdir”. This was yet another report from my colleague Geoff Montee. lsof is your friend, maybe I have to talk about it one day at FOSDEM (call for papers is still open 🙂 I like to find and follow bugs and missing details in MySQL manual, maybe because I would never be able to contribute to it as a writer directly…

So, this list shows my typical recent interests related to MySQL bugs – mostly InnoDB, optimizer, replication problems, fine manual and just some fun details like the way some Oracle engineers try to avoid working extra hard while processing bugs… I am also happy to know that in some cases MariaDB is able to deliver fixes faster.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569