Author: Peter Gulutzan

Reserved Words

In the 1990s C.J.Date said: “The rule by which it is determined within the standard that one key word needs to be reserved while another need not be is not clear to this writer.”
Nothing has changed since then, except there are more reserved words. No…

No more mysql.proc in MySQL 8.0

MySQL has thrown away the mysql.proc table for version 8.0 Development Releases.
The best explanation that I’ve seen is the one that Dmitry Lenev gave at a conference last October.
To summarize it: mysql.proc and its ilk are non-transactional, redundancy is bad, and MySQL can fix some known bugs by moving over to information_schema tables backed by InnoDB. Of course I approve for a separate reason: mysql.proc is non-standard and therefore it is a mistake.
On the other hand, programmers that have invested some time in using mysql.proc will have some trouble changing them to use information_schema.routines instead.
Table definition differences
I did a complex left join of the information_schema.columns for
mysql.proc (P) and for information_schema.routines (R) in MySQL 5.7, and saw this.

P_column_name
P_column_type
P_collation_name
R_column_name
R_column_type
R_collation_name

db
char(64)
utf8_bin
ROUTINE_SCHEMA
varchar(64)
utf8_general_ci

name
char(64)
utf8_general_ci
ROUTINE_NAME
varchar(64)
utf8_general_ci

type
enum(‘FUNCTION’,’PRO
utf8_general_ci
ROUTINE_TYPE
varchar(9)
utf8_general_ci

specific_name
char(64)
utf8_general_ci
SPECIFIC_NAME
varchar(64)
utf8_general_ci

language
enum(‘SQL’)
utf8_general_ci
EXTERNAL_LANGUAGE
varchar(64)
utf8_general_ci

sql_data_access
enum(‘CONTAINS_SQL’,
utf8_general_ci
SQL_DATA_ACCESS
varchar(64)
utf8_general_ci

is_deterministic
enum(‘YES’,’NO’)
utf8_general_ci
IS_DETERMINISTIC
varchar(3)
utf8_general_ci

security_type
enum(‘INVOKER’,’DEFI
utf8_general_ci
SECURITY_TYPE
varchar(7)
utf8_general_ci

param_list
blob
NULL
NULL
NULL
NULL

returns
longblob
NULL
NULL
NULL
NULL

body
longblob
NULL
NULL
NULL
NULL

definer
char(77)
utf8_bin
DEFINER
varchar(77)
utf8_general_ci

created
timestamp
NULL
CREATED
datetime
NULL

modified
timestamp
NULL
LAST_ALTERED
datetime
NULL

sql_mode
set(‘REAL_AS_FLOAT’,
utf8_general_ci
SQL_MODE
varchar(8192)
utf8_general_ci

comment
text
utf8_bin
ROUTINE_COMMENT
longtext
utf8_general_ci

character_set_client
char(32)
utf8_bin
CHARACTER_SET_CLIENT
varchar(32)
utf8_general_ci

collation_connection
char(32)
utf8_bin
COLLATION_CONNECTION
varchar(32)
utf8_general_ci

db_collation
char(32)
utf8_bin
DATABASE_COLLATION
varchar(32)
utf8_general_ci

body_utf8
longblob
NULL
ROUTINE_DEFINITION
longtext
utf8_general_ci

Remember the above chart is for MySQL version 5.7.
For MySQL 8.0.2 these column definitions have changed:
ROUTINE_SCHEMA: was varchar(64) utf8_general_ci, will be varchar(64) utf8_tolower_ci
ROUTINE_TYPE: was varchar(9) utf8_general_ci, will be enum
DATA_TYPE: was varchar(9), will be longtext
EXTERNAL_LANGUAGE: was varchar(64), will be binary(0)
SQL_DATA_ACCESS: was varchar(64), will be enum
SECURITY_TYPE: was varchar(7), will be enum
CREATED: was datetime, will be timestamp
LAST_ALTERED: was datetime, will be timestamp
DEFINER: was varchar(77) utf8_general_ci, will be varchar(93) utf8_bin
CHARACTER_SET_CLIENT: was varchar(32), will be varchar(64)
COLLATION_CONNECTION: was varchar(32), will be varchar(64)
DATABASE_COLLATION: was varchar(32), will be varchar(64)
… and more changes are possible.
I have included the dirt about column data type and collation so that it’s clear they are never exactly the same. This might affect applications that depend on exact size allocations and precise ordering. But usually it will make no difference to either programmers or end users.
What the chart shows is that mysql.proc.db corresponds to information_schema.routines.ROUTINE_SCHEMA, mysql.proc.name corresponds to information_schema.routines.ROUTINE_NAME, and so on. So if I had a simple SQL statement like

SELECT db, name FROM mysql.proc;
I could convert with ease to

SELECT ROUTINE_SCHEMA, ROUTINE_NAME FROM information_schema.routines;
(By the way I used name rather than specific_name because it’s in the primary key; the value is the same.)
However, three mysql.proc columns — param_list, returns, body — have no corresponding columns in information_schema.routines. Converting them will be more work.
param_list and returns
Let’s try

CREATE FUNCTION fx1(paramx1 INT, paramx2 DOUBLE) RETURNS CHAR(5) RETURN ‘xxxxx’;
Let’s look at it via mysql.proc (whenever I show mysql.proc I’m using MySQL 5.7):

SELECT param_list, returns FROM mysql.proc WHERE name=’fx1′;
Result:

+————-+————————+
| param_list | returns |
+————-+————————+
| paramx1 int | char(5) CHARSET latin1 |
+————-+————————+
Now let’s look at it with

SELECT DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, CHARACTER_SET_NAME, DTD_IDENTIFIER
FROM information_schema.routines
WHERE routine_name=’fx1′;
Result:

+———–+————————–+——————–+—————-+
| DATA_TYPE | CHARACTER_MAXIMUM_LENGTH | CHARACTER_SET_NAME | DTD_IDENTIFIER |
+———–+————————–+——————–+—————-+
| char | 5 | latin1 | char(5) |
+———–+————————–+——————–+—————-+
This isn’t too bad — all we have to do, (with sql_mode=’pipes_as_concat’) is concatenate
DATA_TYPE || ‘(‘ || CHARACTER_MAXIMUM_LENGTH || ‘)’ || ‘ CHARSET || CHARACTER_SET_NAME
or, even simpler,
DTD_IDENTIFIER || ‘ CHARSET ‘ || CHARACTER_SET_NAME
and we’ve got “char(5) CHARSET latin1″, the same as what’s in mysql.proc.returns. Using DTD_IDENTIFIER avoids complications with other data types so I’ll always go with it.
It’s trickier to find a substitute for param_list, because parameters aren’t in information_schema.routines at all. We’d find them by saying

SELECT ORDINAL_POSITION, PARAMETER_MODE, PARAMETER_NAME, DTD_IDENTIFIER
FROM information_schema.parameters
WHERE specific_name = ‘fx1′;
+——————+—————-+—————-+—————-+
| ORDINAL_POSITION | PARAMETER_MODE | PARAMETER_NAME | DTD_IDENTIFIER |
+——————+—————-+—————-+—————-+
| 0 | NULL | NULL | char(5) |
| 1 | IN | paramx1 | int(11) |
| 2 | IN | paramx2 | double |
+——————+—————-+—————-+—————-+
We don’t need to do anything with parameter #0 (it’s just a copy of what’s in information_schema.routines.returns); we only need to merge parameter #1 and parameter #2 into the main query. Like this (with sql_mode=’pipes_as_concat’), but skipping the details we’ve already seen:

SELECT routine_name, routine_body,
(SELECT group_concat(parameter_name || ‘ ‘ || dtd_identifier)
FROM information_schema.parameters p
WHERE p.specific_name = outertable.routine_name
AND ordinal_position > 0)
AS param_list
FROM information_schema.routines outertable
WHERE routine_name = ‘fx1’;
Result:

+————–+————–+——————————–+
| ROUTINE_NAME | ROUTINE_BODY | param_list |
+————–+————–+——————————–+
| fx1 | SQL | paramx1 int(11),paramx2 double |
+————–+————–+——————————–+
In other words, we can get param_list from information_schema.routines by adding a subquery that accesses information_schema.parameters. Notice the assumption that the list will be ordered, I’m depending on a quirk.
body
Let’s try:

CREATE PROCEDURE px1() SELECT _latin1 0xe3;
SELECT body, body_utf8, _latin1 0xe3 FROM mysql.proc WHERE name = ‘px1’;
Result:

+———————+————–+————–+
| body | body_utf8 | _latin1 0xe3 |
+———————+————–+————–+
| SELECT _latin1 0xe3 | SELECT 0xe3 | ã |
+———————+————–+————–+
Now let’s try:

CREATE PROCEDURE px2() SELECT ‘abc”def’;
SELECT body, body_utf8 FROM mysql.proc WHERE name = ‘px2’;
Result:

+——————-+——————+
| body | body_utf8 |
+——————-+——————+
| SELECT ‘abc”def’ | SELECT ‘abc’def’ |
+——————-+——————+
So you can see that body and body_utf8 are different. In a sense, both are correct — body is what you want if you are going to make a copy of the routine, body_utf8 is what you want if you want to see what the output would look like if you invoked the routine. So it’s pretty useful that mysql.proc has both.
Unfortunately, information_schema.routines does not. It has no equivalent of body. It only has an equivalent of body_utf8.
Loyal readers may recall that I’ve talked before about the possible information losses when making everything UTF8 but this is even worse. Without an equivalent of mysql.proc.body, you cannot reliably make exact copies of routines even if they are in UTF8.
Privilege differences
Actually the change amounts to more than just the differences between the definitions of the tables. There’s also the fact that PRIVILEGES are different — you can GRANT on mysql.* tables, you cannot GRANT on information_schema.* tables. And what users can see differs depending on how granting is done.
Therefore, for all users who currently hold a SELECT privilege on mysql.proc, we are going to have to work around the problem that there is no way to grant the exact same privilege on information_schema.routines. In other words, if the old (MySQL 5.7) statement was

GRANT SELECT ON mysql.proc TO ‘jean’@’localhost’;
To be the same as that, you need a way to let jean see all the columns in all the routines, but not anything other than the routines. This is possible with the DEFINER clause in routines and views. For example, assuming ‘root’@’localhost’ is a powerful user:

CREATE
DEFINER = ‘root’@’localhost’
SQL SECURITY DEFINER
VIEW v_routines
AS SELECT * FROM information_schema.routines;
GRANT SELECT ON v_routines TO ‘jean’@’localhost’;
SHOW
Quote from MySQL 5.7 manual describing SHOW CREATE PROCEDURE and SHOW CREATE FUNCTION:
“To use either statement, you must be the user named in the routine DEFINER clause or have SELECT access to the mysql.proc table.”
Quote from MySQL 8.0 manual describing SHOW CREATE PROCEDURE and SHOW CREATE FUNCTION:
“To use either statement, you must have the global SELECT privilege.”
Essentially, the privilege requirement in 5.7 is what’s needed for looking at mysql.proc, but the privilege requirement in 8.0 is what’s needed for looking at information_schema.routines.
But this time we can’t work around by creating a view. If I’m interpreting “global SELECT privilege” correctly, the expectation is that if you want to say SHOW CREATE PROCEDURE, you need

GRANT SELECT ON *.* TO user_who_wants_to_show_create_procedure;
Either I’m misinterpreting, or MySQL is demanding that you grant a very broad privilege for a very narrow requirement.
This is too bad because, though SHOW statements are junk, this one will be necessary in MySQL 8.0. That’s because it has the body value right: it does not turn SELECT ‘abc”def’ into SELECT ‘abc’def’ and so on. Thus, it is the only way to get the equivalent of MySQL 5.7’s mysql.proc.body value. Using a connector, you can put this correct value into another table with something like this, leaving out details:

mysql_real_query(“SHOW PROCEDURE p;”)
mysql_fetch_row() /* there’s only going to be one row at most */
mysql_fetch_field() /* the third field is `Create procedure` */
mysql_real_query(“UPDATE copy_of_routines”) /* set body value */
Other ways to get lists of routines
Although MySQL 8.0 is missing mysql.proc, it isn’t missing all the grant-related tables (yet). So you can still say

SELECT * FROM mysql.procs_priv;
Not only that, you can still use SHOW PROCEDURE STATUS — and then put the results in a @variable! As far as I know this is undocumented, but it’s been around for years and nobody has bothered to disallow it.
Try this:

SET @procedurelist = ”;
SHOW PROCEDURE STATUS WHERE (@procedurelist := CONCAT(@procedurelist, `Name`, ‘,’));
SELECT @procedurelist;
Result:

+—————————————————– …
| create_synonym_db,diagnostics,execute_prepared_stmt, …
+—————————————————– …
Demonstrably, the @procedurelist variable now has a list of all procedures. Its only recommended use is to show unbelievers that with MySQL all things are possible.
Effect on ocelotgui debugger
As you might have guessed by now, we have a program that uses mysql.proc, namely the Ocelot open source GUI client for MySQL and MariaDB (ocelotgui). Well, one of its feature components is a debugger for MySQL stored routines, and you can see all the “mysql.proc” references in our source code for that module.
Our plan is: wait and see if MySQL makes the transition easier. But if that doesn’t happen, within a few months we’ll change ocelotgui to implement all the methods that I’ve described. So if you want to see how it’s done in detail, with an actual working application, just watch the code related to mysql.proc until it changes. It’s on github.

Pronouncing Database Terms

It is the business of educated people to speak so that no-one may be able to tell in what county their childhood was passed. — A. Burrell, A Handbook for Teachers in Public Elementary School, 1891
The terms that reveal where a person (mis)spent a DBMS-related childhood are “char”, “data”, “GIF”, “gigabyte”, “GUI”, “JSON”, “query”, “schema”, “tuple”, “_”, “`”, and “«”.
CHAR
(1) Like “Care” because it’s short for “Character” (so hard C and most folks say “Character” that way)?
(2) Like “Car” because it’s short for “Character” (so hard C and a few folks in the British Isles say it that way and perhaps all other English words ending in consonant + “ar” are pronounced that way)?
(3) Like “Char” (the English word for a type of trout)?
C/C++ programmers say (3), for example Bjarne Stroustrup of C++ fame says that’s illogical but usual. However, people who have not come to SQL from another programming language may be more likely to go with (2), leading one online voter to exclaim “I’ve known a lot of people who say “car” though. (Generally SQL-y people; is this what they teach in DBA classes?)” and Tom Kyte of Oracle fame reportedly says “var-car” .
DATA
The Oxford English Dictionary (OED) shows 4 (four!) variations:
“Brit. /ˈdeɪtə/, /ˈdɑːtə/, U.S. /ˈdædə/, /ˈdeɪdə/”.
It’s only the first syllable that matters — DAY or DA?
I haven’t seen the Longman Pronunciation Dictionary, but a blog post says the results of Longman’s preference poll were:
“BrE: deɪtə 92% ˈdɑːtə 6% ˈdætə 2% AmE: ˈdeɪțə 64%ˈdæțə 35% ˈdɑːțə 1%” (notice it’s ț not t for our American friends). By the way OED says in a computing context it’s “as a mass noun” so I guess “data is” is okay.
GIF
It’s “jif”, says its creator.
GIGABYTE
That letter at the start is a hard G; The “Jigabyte” pronunciation is unknown to Merriam-Webster, Cambridge, and Oxford dictionaries.
GUI
No question it’s “gooey”, for all the dictionaries I checked. So pronounce our product as “osselot-goey””osselot-gooey”.
GUID
The author of “Essential COM” says
The exact pronunciation of GUID is a subject of heated debate among COM developers. Although the COM specification says that GUID rhymes with fluid, the author [Don Box] believes that the COM specification is simply incorrect, citing the word languid as setting the precedent.
The COM specification is a standard and therefore cannot be incorrect, but I can’t find it, and I like setting-a-precedent games, so let’s use the exact word Guid, eh? It appears in Hugh MacDiarmid’s masterpiece “A Drunk Man Looks At The Thistle”
But there are flegsome deeps
Where the soul o’Scotland sleeps
That I to bottom need
To wauk Guid kens what deid
.. which proves that Guid is a one-syllable word, though doubtless MacDiarmid pronounced it “Gweed”.
JSON
Doug Crockford of Yahoo fame, seen on Youtube, says:
So I discovered JAYsun. Java Script Object Notation. There’s a lot of argument about how you pronounce that. I strictly don’t care. I think probably the correct pronunciation is [switching to French] “je sens”.
The argument is mostly between people who say JAYsun and people who say JaySAWN. It’s controversial, and in our non-JSON environment it’s a foreign word, so spelling it out J S O N is safe and okay.
QUERY
In the 1600s the spelling was “quaery”, so it must have rhymed with “very”, and it still does, for some Americans. But the OED says that both American and British speakers say “QUEERie” nowadays.
SCHEMA
It’s “Skema”. The “Shema” pronunciation is unknown to Merriam-Webster, Cambridge, and Oxford dictionaries.
SQL
See the earlier post “How to pronounce SQL” which concluded:
In the end, then, it’s “when in Rome do as the Romans do”. In Microsoft or Oracle contexts one should, like Mr Ellison, respect Microsoft’s or Oracle’s way of speaking. But here in open-source-DBMS-land the preference is to follow the standard.
TUPLE
See the earlier post “Tuples”. It’s “Tuhple”.
_
According to Swan’s “Practical English Usage” the _ (Unicode code point 005F) character is more often called underline by Britons, more often called underscore by Americans. (The SQL-standard term is underscore.) (The Unicode term is LOW LINE; SPACING UNDERSCORE was the old Unicode-version-1.0 term.)
` `
This is a clue for telling if people have MySQL backgrounds — they’ll pronounce the ` (Unicode code point 0060) symbol as “backtick”. Of course it also is found in other programming contexts nowadays, but there are lots of choices in the Jargon File:
Common: backquote; left quote; left single quote; open quote; ; grave. Rare: Backprime; [backspark]; unapostrophe; birk; blugle; back tick; back glitch; push; ; quasiquote.
By the way The Jargon File is a good source for such whimsical alternatives of ASCII names.
« »
You might be fooled by an Adobe error, as I was, into thinking that these French-quote-mark thingies are pronounced GEELmoes. Wrong. They are GEELmays. (The Unicode term is left-point or right-point double angle quotation marks.) This matter matters because, as Professor Higgins said, “The French don’t care what they do actually, as long as they pronounce it properly.”
Meanwhile …
Enhancements made to the source code for the next version of ocelotgui, Ocelot’s Graphical User Interface for MySQL and MariaDB, are: error messages are optionally in French, and grid output is optionally in HTML. As always, the description of the current version is on ocelot.ca and the downloadable source and releases are on github.

The First Development Milestone for MySQL 8.0

MySQL 8.0.0 exists.
For general impressions we already have comments by Giuseppe Maxia and Stewart Smith and Serdar Yegulalp.
Two new features looked important to me: modern UCA collations, and roles. So I downloaded and tried them out.
Modern UCA collations
MySQL is going to switch to utf8mb4 for the default character set and add collations based on the the latest version of the Unicode Collation Algorithm (UCA 9.0.0). I still see messages indicating the default is still latin1, but they’re incorrect, I can put 4-byte UTF-8 characters in columns that I created without explicitly saying utf8mb4.
The new collations are only for utf8mb4. That’s suboptimal. People still have good reasons to use other character sets (I discussed some of them in an earlier blog post). And in any case, a collation that works for utf8mb4’s repertoire will work for every character set that has a pure subset of that repertoire, which is to say, every character set.
The new collations are only for the generic “Default Unicode Collation Element Table” (DUCET), and for Latin-based alphabets. So there are no updates for “persian” or “sinhala”.
For an example, the following table shows changes between the old Swedish collation (utf8mb4_swedish_ci) and the new one (utf8mb4_sv_0900_ai_ci). The “Rule” column has what Unicode says about certain Swedish primary (level-1) comparisons, the “Example” column has what an SQL comparison would look like, the “Old” column has the results I got with utf8mb4_swedish_ci, the “New” column has the results I got with utf8mb4_sv_0900_ai_ci.
Rule Example Old New
—————————- ——— —– —-
ETH = D ‘Ð’ = ‘D’ FALSE TRUE
D STROKE = D ‘Đ’ = ‘D’ FALSE TRUE
THORN = TH ‘Þ’ = ‘TH’ FALSE TRUE
O DOUBLE ACUTE = O DIAERESIS ‘Ő’ = ‘Ö’ FALSE TRUE
U DOUBLE ACUTE = Y ‘Ű’ = ‘Y’ FALSE TRUE
L STROKE = L ‘Ł’ = ‘L’ FALSE TRUE
A DIAERESIS = E OGONEK ‘Ä’ = ‘Ę’ FALSE TRUE
OE = O DIAERESIS ‘Œ’ = ‘Ö’ FALSE TRUE
O CIRCUMFLEX = O DIAERESIS ‘Ô’ = ‘Ö’ FALSE TRUE
Most Swedes don’t know about these rules, they apply to medieval texts or foreign names. But most Swedes do know that rules should cover the edge cases, not just the Stockholm phone book. Because it follows the Unicode rules, the new collation is better.
But the new collation’s name is worse, for two reasons.
(1) The “_ai” suffix, meaning “accent insensitive”, is Microsoftish. There is such a thing, but the definition of “accent” varies between languages and the factors that influence collations can be other things besides accents. Clearer suffixes for extra-sensitive collation names would be “w2” or “l2” (for weight=2 or level=2), and they’re for sorting rather than searching unless you’re Japanese, but a default = no-suffix-for-accents would have been okay.
(2) The “_sv” suffix, meaning “Swedish”, is an unnecessary change. Compatibility with the previous suffix — “swedish” — would not have violated UCA specifications and would have been clearer for people who have used MySQL before.
For a second example, I looked at the new “Latin” collation, utf8mb4_la_0900_ai_ci. This time I couldn’t find any rules file in the Unicode standard directory. There is a UCA chart for Latin but utf8mb4_la_0900_ai_ci obviously isn’t following it at all. Instead it’s like MySQL’s old and silly “Roman” collation, where i=j and u=v. This is not an important collation. But MySQL claims the new collations follow UCA rules, and here is one that doesn’t, so I worry about the others.
This has to be taken in context — MySQL has far better support for character sets and collations than any other open-source DBMS, except sometimes MariaDB. And now it’s a weenie bit more far better. Observations about paucity of new UCA collations, bad names, or standard non-compliance won’t change that fact.
Roles
I discussed MariaDB’s roles in 2014. MySQL’s roles are already in the 8.0 documentation. Is MySQL making an improvement?
The first thing I noticed is that the syntax rules for roles are, too often, the same as the syntax rules for users. This is especially obvious when I ask for things that make no sense for roles, for example:

mysql>CREATE ROLE ‘r’@’host’;
OK 0 rows affected (0.1 seconds)

mysql>CREATE ROLE ”;
Error 1396 (HY000) Operation CREATE USER failed for anonymous user

mysql>GRANT ‘root’@’localhost’ TO role_name;
OK 0 rows affected (0.1 seconds)

mysql>DROP USER role_name;
OK 0 rows affected (0.1 seconds)

Because of this, some non-standard limitations exist: maximum name length is 32, names are case sensitive, role names cannot be the same as user names, and there is no separate information_schema table
However, the DML statements that I tested for MariaDB do work with MySQL as well, and are often exactly the same:

MariaDB: CREATE [OR REPLACE] ROLE [IF NOT EXISTS] role_name [WITH ADMIN …];
MySQL: CREATE ROLE [IF NOT EXISTS] role_name [,role_name…];

MariaDB: DROP ROLE [IF EXISTS] role_name [,role_name…];
MySQL: DROP ROLE [IF EXISTS] role_name [,role_name…];

MariaDB: SET DEFAULT ROLE {role_name|NONE} [FOR user_name];
MySQL: SET DEFAULT ROLE ALL TO user_name [,user_name…];

MariaDB: SET ROLE {role_name|NONE};
MySQL: SET ROLE {role_name|NONE};

MariaDB: SELECT CURRENT_ROLE() | CURRENT_ROLE;
MySQL: SELECT CURRENT_ROLE();

MariaDB: [no exact equivalent]
MySQL: GRANT CREATE ROLE ON *.* TO grantee;

MariaDB: SHOW GRANTS [FOR role_name];
MySQL: SHOW GRANTS [FOR role_name];
MySQL: SHOW GRANTS [FOR user_name USING role_name[,role_name…]];

MariaDB: GRANT role_name TO grantee [,grantee…] [WITH ADMIN OPTION];
MySQL: GRANT role_name[,role_name…] TO grantee [,grantee…];

(The last GRANT example surprised me. MariaDB has trouble granting multiple roles in one statement, it’s Bug#5772. MySQL appears to be “solving” it by making certain role names illegal unless they’re delimited; I’m not sure that’s the right way to solve it.)
Circular roles (GRANT r1 TO r2; GRANT r2 TO r1;) are allowed but I expect they’ll be disallowed in a later version.
Example:

/* as a user with lots of privileges */
CREATE USER ‘u’@’localhost’;
CREATE ROLE r;
CREATE TABLE t1 (s1 INT);
CREATE TABLE t2 (s1 INT);
GRANT SELECT ON t1 TO r;
GRANT r TO ‘u’@’localhost’;
/* as user ‘u’@’localhost’ */
SET ROLE r;
SELECT * FROM t1;
SELECT * FROM t2;
/* The first SELECT succeeds, the second SELECT fails. */

To generalize: so far MySQL 8.0.0 allows creation of roles but they have to look like users. So the syntax is undesirable, but they work properly.
Again, remember the context. There’s nothing wrong with a feature that’s not ready, until MySQL declares that it’s ready.
Typos
MySQL’s announcement, buried in a section about minor fixes, says “Foreign key names as stored in the foreign_keys and foreign_key_column_usage tables are a maximum of 64 characters, per the SQL standard”. Er, up to a point. The SQL standard says “In a regular identifier, the number of identifier parts shall be less than 128.”
Us Too
We have a new-version announcement too. Version 1.0.3 of the Ocelot Graphical User Interface (ocelotgui) for MySQL and MariaDB came out on Tuesday September 27 2016. Some new items are …
As well as getting result sets in the result-set widget, one can get them added to the history widget, with the same format as what the mysql client outputs.
As well as predicting what the next word should be, Ocelot’s syntax recognizer makes it possible to show hints if the user hovers over a word.
Finally, there is a rudimentary formatter. Clicking the edit menu item Edit|Format will change indentation, make keywords upper case, etc. I say “rudimentary” because, without a standard to follow, one must depend on taste, and nobody shares the taste that’s on display here.
Documentation is now on ocelot.ca/index.htm. C++ source and Linux-ready packages are on github.com/ocelot-inc/ocelotgui.

Binary Serializers

DBMS client applications need to store SQL query results in local memory or local files. The format is flat and the fields are ordered — that’s “serialization”. The most important serializer format uses human-readable markup, like
[start of field] [value] [end of field]
and the important ones in the MySQL/MariaDB world are CSV (what you get with SELECT … INTO OUTFILE or LOAD INFILE), XML (what you get with –xml or LOAD XML), and JSON (for which there are various solutions if you don’t use MySQL 5.7).
The less important serializer format uses length, like
[length of value] [value]
and this, although it has the silly name “binary serialization”, is what I want to talk about.
The length alone isn’t enough, we also need to know the type, so we can decode it correctly. With CSV there are hints such as “is the value enclosed in quotes”, but with binary serializers the value contains no hints. There has to be an indicator that says what the type is. There might be a single list of types for all of the records, in which case the format is said to “have a schema”. Or there might be a type attached to each record, like
[type] [length of value] [value]
in which case the format is often called “TLV” (type-length-value).
Binary serializers are better than markup serializers if you need “traversability” — the ability to skip to field number 2 without having to read every byte in field number 1. Binary TLV serializers are better than binary with-schema serializers if you ned “flexibility” — when not every record has the same number of fields and not every field has the same type. But of course TLV serializers might require slightly more space.
A “good” binary serializer will have two Characteristics:
#1 It is well known, preferably a standard with a clear specification, but otherwise a commonly-used format with a big sponsor. Otherwise you have to write your own library and you will find out all the gotchas by re-inventing a wheel. Also, if you want to ship your file for import by another application, it would be nice if the other application knew how to import it.
#2 It can store anything that comes out of MySQL or MariaDB.
Unfortunately, as we’ll see, Characteristic #1 and Characteristic #2 are contradictory. The well-known serializers usually were made with the objective of storing anything that comes out of XML or JSON, or that handled quirky situations when shipping over a wire. So they’re ready for things that MySQL and MariaDB don’t generate (such as structured arrays) but not ready for things that MySQL and MariaDB might generate (such as … well, we’ll see as I look at each serializer).
To decide “what is well known” I used the Wikipedia article Comparison of data serialization formats. It’s missing some formats (for example sereal) but it’s the biggest list I know of, from a source that’s sometimes neutral. I selected the binary serializers that fit Characteristic #1. I evaluated them according to Characteristic #2.
I’ll look at each serializer. Then I’ll show a chart. Then you’ll draw a conclusion.
Avro
Has schemas. Not standard but sponsored by Apache.
I have a gripe. Look at these two logos. The first one is for the defunct British/Canadian airplane maker A.V.Roe (from Wikipedia). The second one is for the binary serializer format Apache Avro (from their site).

Although I guess that the Apache folks somehow have avoided breaking laws, I think that taking A.V.Roe’s trademark is like wearing medals that somebody else won. But putting my gripe aside, let’s look at a technical matter.

The set of primitive type names is:
null: no value

Well, of course, in SQL a NULL is not a type and it is a value. This is not a showstopper, because I can declare a union of a null type and a string type if I want to allow nulls and strings in the same field. Um, okay. But then comes the encoding rule:
null is written as zero bytes.
I can’t read that except as “we’re like Oracle 12c, we think empty strings are NULLs”.
ASN.1
TLV. Standard.
ASN means “abstract syntax notation” but there are rules for encoding too, and ASN.1 has a huge advantage: it’s been around for over twenty years. So whenever any “why re-invent the wheel?” argument starts up on any forum, somebody is bound to ask why all these whippersnapper TLVs are proposed considering ASN.1 was good enough for grand-pappy, eh?
Kidding aside, it’s a spec that’s been updated as recently as 2015. As usual with official standards, it’s hard to find a free-and-legitimate copy, but here it is: the link to a download of “X.690 (08/2015) ITU-T X.690 | ISO/IEC 8825-1 ISO/IEC 8825-1:2015 Information technology — ASN.1 encoding rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER)” from the International Telecommunication Union site: http://www.itu.int/rec/T-REC-X.690-201508-I/en.
It actually specifies how to handle exotic situations, such as
** If it is a “raw” string of bits, are there unused bits in the final byte?
** If the string length is greater than 2**32, is there a way to store it?
** Can I have a choice between BMP (like MySQL UCS2) and UTF-8 and other character sets?
** Can an integer value be greater than 2**63?
… You don’t always see all these things specified except in ASN.1.
Unfortunately, if you try to think of everything, your spec will be large and your overhead will be large, so competitors will appear saying they have something “simpler” and “more compact”. Have a look at trends.google.com to see how ASN.1 once did bestride the narrow world like a colossus, but nowadays is not more popular than all the others.
BSON
TLV. Sponsored by MongoDB.
Although BSON is “used mainly as a data storage and network transfer format in the MongoDB [DBMS]”, anybody can use it. There’s a non-Mongo site which refers to independent libraries and discussion groups.
BSON is supposed to make you think “binary JSON” but in fact all the binary serializers that I’m discussing (and I few that I’m not discussing such as UBJSON) can do a fair job of representing JSON-marked-up-text in binary format. Some people even claim that MessagePack does a better job of that than BSON does.
There is a “date” but it is milliseconds since the epoch, so it might be an okay analogue for MySQL/MariaDB TIMESTAMP but not for DATETIME.
CBOR
TLV. Proposed standard.
CBOR is not well known but there’s an IETF Internet Standards Document for it (RFC 7049 Concise Binary Object Representation), so I reckoned it’s worth looking at. I don’t give that document much weight, though — it has been in the proposal phase since 2013.
The project site page mentions JSON data model, schemalessness, raw binary strings, and concise encoding — but I wanted to see distinguishing features. There are a few.
I was kind of surprised that there are two “integer” types: one type is positive integers, the other type is negative integers.
In other words -5 is
[type = negative number] [length] [value = 5]
rather than the Two’s Complement style
[type = signed number] [length] [value = -5]
but that’s just an oddness rather than a problem.
There was an acknowledgment in the IETF document that “CBOR is inspired by MessagePack”. But one of MessagePack’s defects (the lack of a raw string type) has been fixed now. That takes away one of the reasons that I’d have for regarding CBOR as a successor to MessagePack.
Fast Infoset
TLV. Uses a standard.
After seeing so much JSON, it’s nice to run into an international standard that specifies a binary encoding format for the XML Information Set (XML Infoset) as an alternative to the XML document format”. Okay, they get points for variety.
However, it’s using ASN.1’s underlying encoding methods, so I won’t count it as a separate product.
MessagePack
TLV. Not standard but widely used.
MessagePack, also called MsgPack, is popular and is actually used as a data storage format for Pinterest and Tarantool.
It’s got a following among people who care a lot about saving bytes; for example see this Uber survey where MessagePack beat out some of the other formats that I’m looking at here.
One of the flaws of MessagePack, from my point of view, is its poor handling for character sets other than UTF-8. But I’ll admit: when MessagePack’s original author is named Sadayuki Furuhashi, I’m wary about arguing that back in Japan UTF-8 is not enough. For some of the arguing that happened about supporting other character sets with MessagePack, see this thread. Still, I think my “The UTF-8 world is not enough” post is valid for the purposes I’m discussing.
And the maximum length of a string is 2**32-1 bytes, so you can forget about dumping a LONGBLOB. I’d have the same trouble with BSON but BSON allows null-terminated strings.
OPC-UA
TLV. Sort of a standard for a particular industry group.
Open Platform Communications – Unified Architecture has a Binary Encoding format.
Most of the expected types are there: boolean, integer, float, double, string, raw string, and datetime. The datetime description is a bit weird though: number of 100 nanosecond intervals since January 1, 1601 (UTC). I’ve seen strange cutover dates in my time, but this is a new one for me.
For strings, there’s a way to indicate NULLs (hurrah).
I have the impression that OPC is an organization for special purposes (field devices, control systems, etc.) and I’m interested in general-purpose formats, so didn’t look hard at this.
Protocol Buffers
Has schemas. Not standard but sponsored by Google.
Like Avro, Google’s Protocol Buffers have a schema for the type and so they are schema + LV rather than TLV. But MariaDB uses them for its Dynamic Columns feature, so everybody should know about them.
Numbers and strings can be long, but there’s very little differentiation — essentially you have integers, double-precision floating point numbers, and strings. So, since I was objecting earlier when I saw that other serialization formats didn’t distinguish (say) character sets, I have to be fair and say: this is worse. When the same “type” tag can be used for multiple different types, it’s not specific enough.
Supposedly the makers of Protocol Buffers were asked why they didn’t use ASN.1 and they answered “We never heard of it before”. That’s from a totally unreliable biased source but I did stop and ask myself: is that really so unbelievable? In this benighted age?
Thrift
Can be TLV but depends on protocol. Not standard but sponsored by Apache, used a lot by Facebook.
I looked in vain for what one might call a “specification” of Thrift’s binary serialization, and finally found an old stackoverflow discussion that said: er, there isn’t any. There’s a “Thrift Missing Guide” that tells me the base types, and a Java class describer for one of the protocols to help me guess the size limits.
Thrift’s big advantage is that it’s language neutral, which is why it’s popular and there are many libraries and high-level tutorials. That makes it great as a communication format, which is what it’s supposed to be. However, the number of options is small and the specification is so vague that I can’t call it “good” according to the criteria I stated earlier.
The Chart
I depend on each serializer’s specification, I didn’t try anything out, I could easily have made some mistakes.
For the “NULL is a value” row, I say No (and could have added “Alackaday!”) for all the formats that say NULL is a data type. Really the only way to handle NULL is with a flag so this would be best:
[type] [length] [flag] [value]
and in fact, if I was worried about dynamic schemas, I’d be partial to Codd’s “two kinds of NULLs” arguments, in case some application wanted to make a distinction between not-applicable-value and missing-value.
For most of the data-type rows, I say Yes for all the formats that have explicit defined support. This does not mean that it’s impossible to store the value — for example it’s easy to store a BOOLEAN with an integer or with a user-defined extension — but then you’re not using the format specification so some of its advantages are lost.
For dates (including DATETIME TIMESTAMP DATE etc.), I did not worry if the precision and range were less than what MySQL or MariaDB can handle. But for DECIMAL, i say No if the maximum number of digits is 18 or if there are no post-decimal digits.
For LONGBLOB, I say No if the maximum number of bytes is 2**32.
For VARCHAR, I say Yes if there’s any way to store any encoded characters (rather than just bytes, which is what BINARY and BLOB are). In the “VARCHAR+” row I say Yes if there is more than one character set, although this doesn’t mean much — the extra character sets don’t match with MySQL/MariaDB’s variety.
I’ll say again that specifications allow for “extensions”, for example with ASN.1 you can define your own tags, but I’m only looking at what’s specific in the specification.

Avro
ASN.1
BSON
CBOR
Message Pack
OPC UA
Protocol Buffers
Thrift

NULL is a value
no
no
no
no
no
YES
no
no

BOOLEAN
YES
YES
YES
YES
YES
YES
no
YES

INTEGER
YES
YES
YES
YES
YES
YES
YES
YES

BIGINT
YES
YES
YES
YES
YES
YES
YES
YES

FLOAT
YES
YES
YES
YES
YES
YES
no
no

DOUBLE
YES
YES
YES
YES
YES
YES
YES
YES

BINARY / BLOB
YES
YES
YES
YES
YES
YES
YES
YES

VARCHAR
YES
YES
YES
YES
YES
YES
no
YES

Dates
no
YES
YES
YES
no
YES
no
no

LONGBLOB
YES
YES
no
YES
no
no
YES
no

DECIMAL
no
YES
no
YES
no
no
no
no

VARCHAR+
no
YES
no
no
no
YES
no
no

BIT
no
YES
no
no
no
no
no
no

Your Conclusion
You have multiple choice:
(1) Peter Gulutzan is obsessed with standards and exactness,
(2) Well, might as well use one of these despite its defects
(3) We really need yet another binary serializer format.
ocelotgui news
Recently there were some changes to the ocelot.ca site to give more prominence to the ocelotgui manual, and a minor release — ocelotgui version 1.02 — happened on August 15.

SQLite and Standard SQL

I’m going to need to use SQLite syntax for a project that I’m involved with, and predictably I wonder: how standard is it? The SQLite folks themselves make modest claims to support most of the features with a special focus on SQL-92, but (a) I like to do my own counting (b) there’s no official standard named SQL-92 because it was superseded 17 years ago.
By ignoring SQL-92 claims I eschew use of the NIST test suite. I’ll be far less strict and more arbitrary: I’ll go through SQL:2011’s “Feature taxonomy and definition for mandatory features”. For each feature in that list, I’ll come up with a simple example SQL statement. If SQLite appears to handle the example, I’ll mark it “Okay”, else I’ll mark it “Fail”. I’m hoping that arbitrariness equals objectivity, because the unfair pluses should balance the unfair minuses.
Skip to the end of this blog post if you just want to see the final score.
Standard SQL Core Features, Examples, and Okay/Fail Results
E-011 Numeric data types
E-011-01 INTEGER and SMALLINT
Example: create table t (s1 int);
Fail. A numeric column can contain non-numeric strings. There is a similar flaw for all data types, but let’s count them all as only one fail.
E-011-02 REAL, DOUBLE PRECISON,and FLOAT data types
Example: create table tr (s1 float);
Okay.
E-011-03 DECIMAL and NUMERIC data types
Example: create table td (s1 numeric);
Okay, although: when there are many post-decimal digits there is a switch to exponential notation, for example after “insert into t3 values (0.0000000000000001);” and “select *from t3” I get “1.0e-16”. I regard this as adisplay flaw rather than a fail.
E-011-04 Arithmetic operators
Example: select 10+1,9-2,8*3,7/2 from t;
Okay. SQLite is wrong to calculate that 7/0 is NULL, though.
E-011-05 Numeric comparison
Example: select * from t where 1 varchar_column;
Okay, but only because SQLite doesn’t distinguish between character data types.
E021-11 POSITION function
Example; select position(x in y) from z;
Fail. There is no such function.
E021-02 Character comparison
Example: select * from t where s1 > ‘a’;
Okay. I should note here that comparisons are case sensitive, and it is devilishly hard to change this except with ASCII,but case insensitivity is not a requirement for this feature.
E031 Identifiers
E031-01 Delimited
Example: create table “t47″ (s1 int);
Fail. Although I can enclose identifiers inside double quotes, that doesn’t make them case sensitive.
E031-02 Lower case identifiers
Example: create table t48 (s1 int);
Okay.
E031-03 Trailing underscore
Example: create table t49_ (s1 int);
Okay.
E051 Basic query specification
E051-01 SELECT DISTINCT
Example: select distinct s1 from t;
Okay.
E051-02 GROUP BY clause
Example: select distinct s1 from t group by s1;
Okay.
E051-04 GROUP BY can contain columns not in select list
Example: select s1 from t group by lower(s1);
Okay.
E051-05 select list items can be renamed
Example: select s1 as K from t order by K;
Okay.
E051-06 HAVING clause
Example: select s1,count(*) from t having s1 (select s1 from t);
Fail. There was more than one row in the subquery result set, but SQLite didn’t return an error.
E061-11 Subqueries in IN predicate
Example: select * from t where s1 in (select s1 from t);
Okay.
E061-12 Subqueries in quantified comparison predicate
Example: select * from t where s1 >= all (select s1 from t);
Fail. Syntax error.
E061-13 Correlated subqueries
Example: select * from t where s1 = (select s1 from t2 where t2.s2 = t.s1);
Okay.
E061-14 Search condition
Example: select * from t where 0 0 or ‘a’ 0;
Okay.
E091-03 MAX
Example: select max(s1) from t7 where s1 > 0;
Okay.
E091-04 MIN
Example: select min(s1) from t7 where s1 > 0;
Okay.
E091-05 SUM
Example: select sum(1) from t7 where s1 > 0;
Okay.
E091-06 ALL quantifier
Example: select sum(all s1) from t7 where s1 > 0;
Okay.
E091-07 DISTINCT quantifier
Example: select sum(distinct s1) from t7 where s1 > 0;
Okay.
E101 Basic data manipulation
E101-01 INSERT statement
Example: insert into t (s1) values (”),(null),(55);
Okay.
E101-03 Searched UPDATE statement
Example: update t set s1 = null where s1 in (select s1 from t2);
Okay.
E01-04 Searched DELETE statement
Example: delete from t where s1 in (select s1 from t);
Okay.
E111 Single row SELECT statement
Example: select count(*) from t;
Okay.
E121 Basic cursor support
E121-01 DECLARE CURSOR
Fail. SQLite doesn’t support cursors.
E121-02 ORDER BY columns need not be in select list
Example: select s1 from t order by s2;
Okay. Update on 2016-06-27: Originally I wrongly said “Fail”, see the comments.
E121-03 Value expressions in select list
Example: select s1 from t7 order by -s1;
Okay.
E121-04 OPEN statement
Fail. SQLite doesn’t support cursors.
E121-06 Positioned UPDATE statement
Fail. SQLite doesn’t support cursors.
E121-07 Positioned DELETE statement
Fail. SQLite doesn’t support cursors.
E121-08 CLOSE statement
Fail. SQLite doesn’t support cursors.
E121-10 FETCH statement implicit next
Fail. SQLite doesn’t support cursors.
E121-17 WITH HOLD cursors
Fail. SQLite doesn’t support cursors.
E131 Null value support (nulls in lieu of values)
Example: select s1 from t7 where s1 is null;
Okay.
E141 Basic integrity constraints
E141-01 NOT NULL constraints
Example: create table t8 (s1 int not null);
Okay.
E141-02 UNIQUE constraints of NOT NULL columns
Example: create table t9 (s1 int not null unique);
Okay.
E141-03 PRIMARY KEY constraints
Example: create table t10 (s1 int primary key);
Okay, although SQLite wrongly assumes s1 is auto-increment.
E141-04 Basic FOREIGN KEY constraint with the NO ACTION default for both referential delete action and referential update action.
Example: create table t11 (s1 int references t10);
Fail. The foreign-key check will only be checked when I have said “pragma foreign_keys = on;”.
E141-06 CHECK constraints
Example: create table t12 (s1 int, s2 int, check (s1 = s2));
Okay.
E141-07 Column defaults
Example: create table t13 (s1 int, s2 int default -1);
Okay.
E141-08 NOT NULL inferred on primary key
Example: create table t14 (s1 int primary key);
Fail. I am able to insert NULL if I don’t explicitly say the column is NOT NULL.
E141-10 Names in a foreign key can be specified in any order
Example: create table t15 (s1 int, s2 int, primary key (s1,s2));
create table t16 (s1 int, s2 int, foreign key (s2,s1) references t15 (s1,s2));
Okay.
E151 Transaction support
E151-01 COMMIT statement
Example: commit;
Fail. I have to say BEGIN TRANSACTION first.
E151-02 ROLLBACK statement
Example: rollback;
Okay.
E152 Basic SET TRANSACTION statement
E152-01 SET TRANSACTION statement ISOLATION SERIALIZABLE clause
Example: set transaction isolation level serializable;
Fail. Syntax error.
E152-02 SET TRANSACTION statement READ ONLY and READ WRITE clauses
Example: set transaction read only;
Fail. Syntax error.
E153 Updatable queries with subqueries
E161 SQL comments using leading double minus
Example: –comment;
Okay.
E171 SQLSTATE suport
Example: drop table no_such_table;
Fail. At least, the error message doesn’t hint that SQLSTATE exists.
E182 Host language binding
Okay. The existence of shell executable proves there is a C binding.
F031 Basic schema manipulation
F031-01 CREATE TABLE statement to create persistent base tables
Example: create table t20 (t20_1 int not null);
Okay.
F031-02 CREATE VIEW statement
Example: create view t21 as select * from t20;
Okay.
F031-03 GRANT statement
Fail. SQLite doesn’t support privileges.
F031-04 ALTER TABLE statement: add column
Example: alter table t7 add column t7_2 varchar default ‘q’;
Okay.
F031-14 DROP TABLE statement: RESTRICT clause
Example: drop table t20 restrict;
Fail. Syntax error, and RESTRICT is not assumed.
F031-14 DROP VIEW statement: RESTRICT clause
Example: drop view v2 restrict;
Fail. Syntax error, and RESTRICT is not assumed.
F031-10 REVOKE statement: RESTRICT clause
Fail. SQLite does not support privileges.
F041 Basic joined table
F041-01 Inner join but not necessarily the INNER keyword
Example: select a.s1 from t7 a join t7 b;
Okay.
F041-02 INNER keyword
Example: select a.s1 from t7 a inner join t7 b;
Okay.
F041-03 LEFT OUTER JOIN
Example: select t7.*,t22.* from t22 left outer join t7 on (t22_1=s1);
Okay.
F041-04 RIGHT OUTER JOIN
Example: select t7.*,t22.* from t22 right outer join t7 on (t22_1=s1);
Fail. Syntax error.
F041-05 Outer joins can be nested
Example: select t7.*,t22.* from t22 left outer join t7 on (t22_1=s1) left outer join t23;
Okay.
F041-07 The inner table in a left or right outer join can also be used in an inner join
Example: select t7.* from t22 left outer join t7 on (t22_1=s1) inner join t22 on (t22_4=t22_5);
Okay. The query fails due to a syntax error but that’s expectable.
F041-08 All comparison operators are supported (rather than just =)
Example: select * from t where 0=1 or 0>1 or 0 0;
Okay.
F181 Multiple module support
Fail. SQLite doesn’t have modules.
F201 CAST function
Example: select cast(s1 as int) from t;
Okay.
F221 Explicit defaults
Example: update t set s1 = default;
Fail. Syntax error.
F261 CASE expression
F261-01 Simple CASE
Example: select case when 1 = 0 then 5 else 7 end from t;
Okay.
F261-02 Searched CASE
Example: select case 1 when 0 then 5 else 7 end from t;
Okay.
F261-03 NULLIF
Example: select nullif(s1,7) from t;
Okay.
F261-04 COALESCE
Example: select coalesce(s1,7) from t;
Okay.
F311 Schema definition statement
F311-01 CREATE SCHEMA
Fail. SQLite doesn’t have schemas or databases.
F311-02 CREATE TABLE for persistent base tables
Fail. SQLite doesn’t have CREATE TABLE inside CREATE SCHEMA.
F311-03 CREATE VIEW
Fail. SQLite doesn’t have CREATE VIEW inside CREATE SCHEMA.
F311-04 CREATE VIEW: WITH CHECK OPTION
Fail. SQLite doesn’t have CREATE VIEW inside CREATE SCHEMA.
F311-05 GRANT statement
Fail. SQLite doesn’t have GRANT inside CREATE SCHEMA.
F471 Scalar subquery values
Example: select s1 from t where s1 = (select count(*) from t);
Okay.
F481 Expanded NULL Predicate
Example: select * from t where row(s1,s1) is not null;
Fail. Syntax error.
F812 Basic flagging
Fail. SQLite doesn’t support any flagging
S011 Distinct types
Example: create type x as float;
Fail. SQLite doesn’t support distinct types.
T321 Basic SQL-invoked routines
T321-01 User-defined functions with no overloading
Example: create function f () returns int return 5;
Fail. SQLite doesn’t support user-defined functions.
T321-02 User-defined procedures with no overloading
Example: create procedure p () begin end;
Fail. SQLite doesn’t support user-defined procedures.
T321-03 Function invocation
Example: select f(1) from t;
Fail. SQLite doesn’t support user-defined functions.
T321-04 CALL statement.
Example: call p();
Fail. SQLite doesn’t support user-defined procedures.
T321-05 RETURN statement.
Example: create function f() returns int return 5;
Fail. SQLite doesn’t support user-defined functions.
T631 IN predicate with one list element
Example: select * from t where 1 in (1);
Okay.
The Final Score
Fail: 59
Okay: 75
Update 2016-06-26: Originally I counted 60 to 74, that was an error.
So SQLite could claim to support most of the core features of the current standard, according to this counting method, after taking into account all the caveats and disclaimers embedded in the description above.
I anticipate the question, “Will ocelotgui (the Ocelot Graphical User Interface for MySQL and MariaDB) support SQLite too?” and the answer is “I don’t know.” The project would only take two weeks, but I have no idea whether it’s worth that much effort.
In the last while, I’ve concentrated on some ocelotgui bug fixes and on checking whether it runs on Windows as well as on Linux. It does, but only from source — see the instructions at https://github.com/ocelot-inc/ocelotgui/blob/master/windows.txt.

MariaDB 10.2 Window Functions

Today the first MariaDB 10.2 alpha popped up and for the first time there is support for window functions.
I’ll describe what’s been announced, what’s been expected, comparisons to other DBMSs, problems (including crashes and wrong answers), how to prepare, what you can use as a substitute while you wait.
I assume some knowledge of what window functions are. If you’d prefer an introductory tutorial, I’d suggest reading articles like this one by Joe Celko before you continue reading this post.
What’s been announced
The MariaDB sources are:
The release notes
The source code trees — the feature tree up till now has been github.com/MariaDB/server/commits/bb-10.2-mdev9543 but the version-10.2 download page has more choices and is probably more stable.
Sergei Petrunia and Vicentiu Ciorbaru, two developers who I think deserve the credit, made a set of slides for a conference in Berlin earlier this month. It seems to have some typos but is the best description I’ve seen so far.
On Wednesday April 20 Mr Petrunia will give a talk at the Percona conference. Alas, it coincides with Konstantin Osipov’s talk about Tarantool — which I’ve done some work for — which, if you somehow haven’t heard, is a NoSQL DBMS that’s stable and faster than others according to independent benchmarks like the one from Coimbra. What a shame that two such important talks are scheduled for the same time.
Anyway, it’s clear that I’ll have to update this post as more things happen.
What’s been expected
There have been several wishes / feature requests for window functions over the years.
Typical feature requests or forum queries are “Oracle-like Analytic Function RANK() / DENSE_RANK() [in 2004]”, “analytical function like RANK etc to be implemented [in 2008]”, “Is MySQL planning to implement CTE and Window functions? [in 2010]”.
Typical blog posts are Shlomi Noach’s “Three wishes for a new year [in 2012]” and Baron Schwartz’s “Features I’d like in MySQL: windowing functions [in 2013]”..
Typical articles mentioning the MySQL/MariaDB lack of window functions are “What PostgreSQL has over other open source SQL databases” and “Window Functions Comparison …”.
So it’s clear that there has been steady demand, or reason for demand, over the years.
My first applause moment is: Mr Petrunia and Mr Ciorbaru have addressed something that’s been asked for, rather than what they wished had been asked for.
Comparisons to other DBMSs
I know twelve DBMSs that support window functions. No screen is wide enough for a chart showing them all, so I’ll just list their windows-function documents here so that you can click on the names to get the details:
APACHE DRILL,
CUBRID,
DB2 LUW,
DB2 z/OS 10,
DERBY,
FIREBIRD,
INFORMIX,
ORACLE,
POSTGRESQL,
SQL SERVER,
SYBASE,
TERADATA. I’ll show MariaDB against The Big Three.
These functions are mentioned in the standard document as required by optional feature T611 ELementary OLAP operations:

Function
MariaDB
Oracle
DB2
SQL Server

DENSE_RANK
yes
yes
yes
yes

RANK
yes
yes
yes
yes

ROW_NUMBER
yes
yes
yes
yes

These functions are mentioned in the standard document as required by optional feature T612 Advanced OLAP operations:

Function
MariaDB
Oracle
DB2
SQL Server

CUME_DIST
yes
yes
no
yes

PERCENT_RANK
yes
yes
no
yes

These functions are mentioned in the standard document as required by optional features T614 through T617:

Function
MariaDB
Oracle
DB2
SQL Server

FIRST_VALUE
no
yes
yes
yes

LAG
no
yes
yes
yes

LAST_VALUE
no
yes
yes
yes

LEAD
no
yes
yes
yes

NTH_VALUE
no
yes
no
no

NTILE
yes
yes
no
yes

These are common functions which are in the standard and which can be window functions:

Function
MariaDB
Oracle
DB2
SQL Server

AVG
yes
yes
yes
yes

COUNT
yes
yes
yes
yes

COVAR_POP/SAMP
no
yes
yes
yes

MAX
no
yes
yes
yes

MIN
no
yes
yes
no

SUM
yes
yes
yes
yes

VAR_POP/SAMP
no
yes
yes
yes

Yes MariaDB also supports non-standard functions like BIT_XOR, but they’re worthless for comparison purposes. What’s more important is that the MariaDB functions cannot be DISTINCT.
As for the options in OVER clause … just the important ones …

Function
MariaDB
Oracle
DB2
SQL Server

ORDER BY
yes
yes
yes
yes

NULLS FIRST|LAST
no
“yes”
yes
no

PARTITION BY
yes
yes
yes
yes

PRECEDING|FOLLOWING
sometimes
yes
yes
yes

Those are the options that matter. The NULLS clause is important only because it shows how far an implementor will go to support the standard, rather than because most people care. MariaDB in effect supports NULLS HIGH|LOW, which is as good as Oracle — The Oracle manual puts it this way: “NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order.” People who think that’s not cheating can add a comment at the end of this post.
From the above I suppose this second applause moment is justifiable: MariaDB has all the basics, and half of the advanced features that other DBMSs have.
Problems (including crashes and wrong answers)
The MariaDB announcement says:
“Do not use alpha releases on production systems! … Thanks, and enjoy MariaDB!”
Indeed anyone who used 10.2.0 in production would discover that enjoyable things can be bad for you.
I started with this database: …

create table t1 (s1 int, s2 char(5));
insert into t1 values (1,’a’);
insert into t1 values (null,null);
insert into t1 values (1,null);
insert into t1 values (null,’a’);
insert into t1 values (2,’b’);
insert into t1 values (-1,”);
The following statements all cause the server to crash:

select row_number() over ();
select 1 as a, row_number() over (order by a) from dual;
select *, abs(row_number() over (order by s1))
– row_number() over (order by s1) as X from t1;
select rank() over (order by avg(s1)) from t1;
The following statements all give the wrong answers:

select count(*) over (order by s2) from t1 where s2 is null;
select *,dense_rank() over (order by s2 desc),
dense_rank() over (order by s2) from t1;
select *, sum(s1) over (order by s1) from t1 order by s1;
select avg(s1), rank() over (order by s1) from t1;
The following statement causes the client to hang (it loops in mysql_store_result, I think this is the first time I’ve seen this type of error)

select *, avg(s1) over () from t1;
And now for the third applause line … to which you might be saying: huh? Aren’t those, er, less-than-desirable results? To which I would reply: yes, but two weeks ago there were far more and far bigger problems. We should be clapping for how quickly progress has been made, and guessing that this section of my post will be obsolete soon.
How to prepare
You have lots of time to get ready for 10.2, but may as well start now by getting rid of words that have special meaning for window functions.
The word OVER is reserved.
The newly supported function names — DENSE_RANK RANK ROW_NUMBER CUME_DIST PERCENT_RANK NTILE — are not reserved, and the names of functions which will probably be supported soon — FIRST_VALUE LAG LEAD LAST_VALUE NTH_VALUE — will probably not be reserved. But they might as well be, because you won’t be able to use those names for your own functions. Besides, they’re reserved in standard SQL.
What you can use as a substitute
Suppose you don’t want to wait till MariaDB is perfect, or you’d like to stay with MySQL (which as far as I know has made less progress than MariaDB toward this feature). Well, in short: gee that’s too bad. But I have seen three claims about getting a slight subset.
One: Shlomi Noach claimss you can use a trick with GROUP_CONCAT:
Two: Adrian Corston claims you can make delta functions with assignments.
Three: I claim that in ocelotgui you can put ‘row_number() over ()’ in a SELECT and get a row-number column even with older versions of MySQL or MariaDB (this is a recent change, it’s in the source not the binary).
In fact all the “window_function_name() OVER ()” functions could be done easily in the client, if they’re in the select list and not part of an expression,
and the result set is ordered. But I’m not sure whether that’s something to excite the populace.
There might be a “Four:”. I have not surveyed the various applications that can do cumulations. I suspect that mondrian is one, and open OLAP might be another, but haven’t looked at them.
Our own progress
For ocelotgui (Ocelot’s GUI client for MySQL and MariaDB) we had to adjust the syntax checker to highlight the new syntax in 10.2, as this screenshot shows
So we now can claim to have “the only native-Linux GUI that correctly recognizes MariaDB 10.2 window functions”. Catchy slogan, eh? The beta download is at https://github.com/ocelot-inc/ocelotgui. I still expect that it will be out of beta in a few weeks.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569