This Week in Data with Colin Charles 27: Percona Live Tutorials Released and a Comprehensive Review of the FOSDEM MySQL DevRoom

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Santa Clara 2018 update: tutorials have been announced. The committee rated over 300+ talks, and easily 70% of the schedule should go live next week as well. In practice, then, you should see about 50 talks announced next week. There’s been great competition: we only have 70 slots in total, so about 1 in 5 talks get picked — talk about a competitive ratio.

FOSDEM

FOSDEM was truly awesome last week. From a Percona standpoint, we had a lot of excellent booth traffic (being outside of the PostgreSQL room on Saturday, and not too far out from the MySQL room on Sunday). We gave away bottle openers — useful in Brussels with all the beer; we tried a new design with a magnet to attach it to your fridge — stickers, some brochures, but most of all we had plenty of great conversations. There was quite a crowd from Percona, and it was excellent to see the MySQL & Friends DevRoom almost constantly full! A few of us brave souls managed to stay there the whole day, barely with any breaks, so as to enjoy all the talks.

I find the quality of talks to be extremely high. And when it comes to a community run event, with all content picked by an independent program committee, FOSDEM really sets the bar high. There is plenty of competition to get a good talk in, and I enjoyed everything we picked (yes, I was on the committee too). We’ve had plenty of events in the ecosystem that sort of had “MySQL” or related days, but FOSDEM might be the only one that has really survived. I understand we will have a day of some sort at SCALE16x, but even that has been scaled down. So if you care about the MySQL ecosystem, you will really want to ensure that you are at FOSDEM next year.

This year, we started with the usual MySQL Day on Friday. I could not be present, as I was at the CentOS Dojo, giving a presentation. So, the highlight of Friday for me? The community dinner. Over 80 people showed up, I know there was a waiting list, and lots of people were trying to get tickets at the last minute. Many missed out too; sorry, better luck next year; and also, hopefully, we will get a larger venue going forward. I really thank the organizers for this — we affectionately refer to them as the Belconians (i.e. a handful of Perconians based in Belgium). The conversation, the food, the drink —  they were all excellent. It’s good to see representation from all parts of the community: MySQL, Percona, MariaDB, Pythian, and others. So thank you again, Liz, Dimitri, Tom, and Kenny in absentia. I think Tjerk also deserves special mention for always helping (this year with the drinks)

As for FOSDEM itself, beyond the booth, I think the most interesting stuff was the talks. There are video recordings and slides of pretty much all talks, but I will also give you the “Cliff’s Notes” of them here.

MySQL DevRoom talk quick summaries

Beyond WHERE and GROUP BY – Sergei Golubchik

  • EXCEPT is in MariaDB Server 10.3
  • recursive CTEs are good for hierarchical data, graphs, data generation, Turing complete (you can use it to solve Sudoku even)
  • non-recursive CTEs can be an alternative syntax for subqueries in the FROM clause
  • Window functions:
    • Normal: one result per row, depend on that row only
    • Aggregate: one result per group, depending on the whole group
    • Window: one result per row, depending on the whole group
  • System versioned tables with AS OF
  • Aggregate stored functions

MySQL 8.0 Performance: InnoDB Re-Design – Dimitri Kravtchuk

  • Contention-Aware Transactions Scheduling (CATS), since 8.0.3. Not all transactions are equal, FIFO could not be optimal, unblock the most blocking transactions first
  • CATS (VATS) had a few issues, and there were bugs (they thought everything worked since MariaDB Server had implemented it). They spent about 9 months before fixing everything.
  • Where does CATS help? Workloads hitting row lock contentions. You can monitor via SHOW ENGINE INNODB MUTEX.
  • the main problem is because of repeatable read versus read committed transaction isolation on the same workload. You really need to understand your workload when it comes to VATS.

MySQL 8.0 Roles – Giuseppe Maxia

  • Created like a user, granted like privileges. You need to activate them to use them.
  • Before roles, you created a user, then grant, grant, and more grant’s… Add another user? Same deal. Lots of repetitive work and a lot of chances to make mistakes.
  • Faster user administration – define a role, assign it many times. Centralized grant handling – grant and revoke privileges to roles, add/edit all user profiles.
  • You need to remember to set the default role.
  • A user can have many roles; default role can be a list of roles.
  • Roles are users without a login – roles are saved in user tables. This is useful from an account lock/unlock perspective.
  • You can grant a user to a user
  • SET ROLE is for session management; SET DEFAULT ROLE is a permanent assignment of a role for a user. SET ROLE DEFAULT means assign the default role for this user for this session
  • The role_edges table reports which roles are assigned to which users. default_roles keeps track of the current default roles assigned to users. A default role may not exist.

Histogram support in MySQL 8.0 – Øystein Grøvlen

  • You can now do ANALYZE TABLE table UPDATE HISTOGRAM on column WITH n BUCKETS;
  • New storage engine API for sampling (default implementation is full table scan even when sampling)
  • Histogram is stored in a JSON column in the data dictionary. Grab this from the INFORMATION_SCHEMA.
  • Histograms are useful for columns that are not the first column of any index, and used in WHERE conditions of JOIN queries, queries with IN-subqueries, ORDER BY … LIMIT queries. Best fit: low cardinality columns (e.g. gender, orderStatus, dayOfWeek, enums), columns with uneven distribution (skew), stable distribution (do not change much over time)
  • How many buckets? equi-height, 100 buckets should be enough.
  • Histograms are stored in the data dictionary, so will persist over restarts of course.

Let’s talk database optimizers – Vicențiu Ciorbaru

TLS for MySQL at Large Scale – Jaime Crespo

  • Literally took 3 lines in the my.cnf to turn on TLS
  • https://dbtree.wikimedia.org
  • They wanted to do a data centre failover and wanted to ensure replication would be encrypted.
  • They didn’t have proper orchestration in place (MySQL could have this too). Every time OpenSSL or MySQL had to be upgraded, the daemon needed restarting. If there was an incompatible change, you had to sync master/replicas too.
  • The automation and orchestration that Wikipedia uses: https://fosdem.org/2018/schedule/event/cumin_automation/ (it is called Cumin: https://wikitech.wikimedia.org/wiki/Cumin)
  • Server support was poor – OpenSSL – so they had to deploy wmf-mysql and wmf-mariadb of their own
  • Currently using MariaDB 10.0, and looking to migrate to MariaDB 10.1
  • Client library pain they’ve had
  • TLSv1.2 from the beginning (2015).
  • 20-50x slower for actual connecting; the impact is less than 5% for the actual query performance. Just fix client libraries, make them use persistent connections. They are now very interested in ProxySQL for this purpose.
  • https://grafana.wikimedia.org/?orgId=1
  • Monty asks, would a double certificate help? Jaime says sure. But he may not actually use double certificates; might not solve CA issues, and the goal is not to restart the server.
  • Monty wonders why not to upgrade to 10.2? “Let’s talk outside because it’s a much larger question.”

MySQL InnoDB Cluster – Miguel Araújo

  • group replication: update everywhere (multi-master), virtually synchronous replication, automatic server failover, distributed recovery, group reconfiguration, GCS (implementation of Paxos – group communication system). HA is a critical factor.
  • mysqlsh: interactive and batch operations. Document store (CRUD and relational access)
  • admin API in mysqlsh: create & manage clusters, hide complexity of configuration/provisioning/orchestration of the InnoDB clusters. Works with JavaScript and Python
  • Usability. HA out of the box.
  • It’s easy to join a new node; new node goes into recovery mode (and as long as you have all the binary logs, this is easy; otherwise start from a backup)
  • SET PERSIST – run a command remotely, and the configuration is persisted in the server
  • Network flapping? Group replication will just reject the node from the cluster if its flapping too often

Why we’re excited about MySQL 8 – Peter Zaitsev

  • Native data dictionary – atomic, crash safe, DDLs, no more MyISAM system table requirements
  • Fast INFORMATION_SCHEMA
  • utf8mb4 as default character set
  • Security: roles, breakdown of SUPER privileges, password history, faster cached-SHA2 authentication (default), builds using OpenSSL (like Percona Server), skip grants blocks remote connections, logs now encrypted when tablespace encryption enabled
  • Persistent AUTO_INCREMENT
  • auto-managed undo tablespaces – do not use system table space for undo space. Automatically reclaim space on disks.
  • Self-tuning, limited to InnoDB (innodb_dedicated_server to auto-tune)
  • partial in-place update for JSON – update filed in JSON object without full rewrite. Good for counters/statuses/timestamps. Update/removal of element is supported
  • Invisible indexes – test impact of dropping indexes before actually dropping them. Maintained but unused by the optimizer. If not needed or used, then drop away.
  • TmpTable Storage Engine – more efficient storage engine for internal temporary tables. Efficient storage for VARCHAR and VARBINARY columns. Good for GROUP BY queries. Doesn’t support BLOB/TEXT columns yet (this reverts to InnoDB temp table now)
  • Backup locks – prevent operations which may result in inconsistent backups. CHECK INSTANCE FOR BACKUP (something Percona Server has had before)
  • Optimizer histograms – detailed statistics on columns, not just indexes
  • improved cost model for the optimizer – www.unofficialmysqlguide.com
  • Performance schematic – faster (via “fake” indexes), error instrumentation, response time histograms (global & per query), digest summaries
  • select * from sys.session – fast potential replacement for show processlist
  • RESTART (command)
  • SET PERSIST – e.g. change the buffer pool size, and this helps during a restart
  • assumes default storage is SSD now
  • binary log on by default, log_slave_updates enabled by default, and log expires after 30 days by default
  • query cache removed. Look at ProxySQL or some other caching solution
  • native partitioning only – remove partitions from MyISAM or convert to InnoDB
  • resource groups – isolation and better performance (map queries to specific CPU cores; can jail your costly queries, like analytical queries)
  • Feature Requests: better single thread performance, no parallel query support

MySQL Test Framework for Support and Bugs Work – Sveta Smirnova

  • MTR allows you to add multiple connections
  • has commands for flow control

ProxySQL – GTID Consistent Reads – René Cannaò, Nick Vyzas

  • threshold is configurable in increments of 1 second. Replication lag can be monitored with ProxySQL. Want to ensure you don’t have stale reads.
  • Why is GTID important? To guarantee consistently. Auto positioning for restructuring topologies.
  • –session-track-gtids is an important feature which allows sending the GTID for a transaction on the OK packet for a transaction. Not available in MariaDB.
  • There is a ProxySQL Binlog Reader now – GTID information about a MySQL server to all connected ProxySQL instances. Lightweight process to run on your MySQL server.
  • ProxySQL can be configured to enforce GTID consistency for reads on any hostgroup/replication hostgroup.
  • Live demo by René

Turbocharging MySQL with Vitess – Sugu Sougoumarane

  • trend for the cloud: container instances, short-lived containers, tolerate neighbors, discoverability. No good tools yet for Kubernetes.
  • non-ideal options: application sharing, NoSQL, paid solutions, NewSQL (CockroachDB, TiDB, Yugabyte)
  • Vitess: leverage MySQL at massive scale, opensource, 8+ years of work, and multiple production examples
  • Square uses Vitess for Square Cash application.
  • Can MySQL run on Docker? Absolutely, many of the companies do huge QPS on Docker.
  • YouTube does a major re-shard every 2-3 months once. No one notices nowadays when that happens.
  • app server connects to vtgate, and only underneath it’s a bunch of smaller databases with vttablet + mysqld. The lockserver is what makes it run well in the cloud.
  • pluggable architecture with no compromise on performance: monitoring, health check, ACLs, tracing, more.
  • at most, it adds about 2ms overhead to connections
  • Go coding standards are enforced, unit tests with strict coverage requirements, end-to-end tests, Travis, CodeClimate and Netlify. Readability is king.
  • On February 5 2018, it will be a CNCF project. One year of due diligence. They said there was nothing to compare it with. Looked at maturity and contributors. It’s becoming a truly community-owned project! (CNCF to Host Vitess is already live as of