Database Challenges and Innovations. Interview with Jim Starkey

“Isn’t it ironic that in 2016 a non-skilled user can find a web page from Google’s untold petabytes of data in millisecond time, but a highly trained SQL expert can’t do the same thing in a relational database one billionth the size?.–Jim Starkey.
I have interviewed Jim Starkey. A database legend, Jim’s career as an entrepreneur, architect, and innovator spans more than three decades of database history.
Q1. In your opinion, what are the most significant advances in databases in the last few years?
Jim Starkey: I’d have to say the “atom programming model” where a database is layered on a substrate of peer-to-peer replicating distributed objects rather than disk files. The atom programming model enables scalability, redundancy, high availability, and distribution not available in traditional, disk-based database architectures.
Q2. What was your original motivation to invent the NuoDB Emergent Architecture?
Jim Starkey: It all grew out of a long Sunday morning shower. I knew that the performance limits of single-computer database systems were in sight, so distributing the load was the only possible solution, but existing distributed systems required that a new node copy a complete database or partition before it could do useful work. I started thinking of ways to attack this problem and came up with the idea of peer to peer replicating distributed objects that could be serialized for network delivery and persisted to disk. It was a pretty neat idea. I came out much later with the core architecture nearly complete and very wrinkled (we have an awesome domestic hot water system).
Q3. In your career as an entrepreneur and architect what was the most significant innovation you did?
Jim Starkey: Oh, clearly multi-generational concurrency control (MVCC). The problem I was trying to solve was allowing ad hoc access to a production database for a 4GL product I was working on at the time, but the ramifications go far beyond that. MVCC is the core technology that makes true distributed database systems possible. Transaction serialization is like Newtonian physics – all observers share a single universal reference frame. MVCC is like special relativity, where each observer views the universe from his or her reference frame. The views appear different but are, in fact, consistent.
Q4. Proprietary vs. open source software: what are the pros and cons?
Jim Starkey: It’s complicated. I’ve had feet in both camps for 15 years. But let’s draw a distinction between open source and open development. Open development – where anyone can contribute – is pretty good at delivering implementations of established technologies, but it’s very difficult to push the state of the art in that environment. Innovation, in my experience, requires focus, vision, and consistency that are hard to maintain in open development. If you have a controlled development environment, the question of open source versus propriety is tactics, not philosophy. Yes, there’s an argument that having the source available gives users guarantees they don’t get from proprietary software, but with something as complicated as a database, most users aren’t going to try to master the sources. But having source available lowers the perceived risk of new technologies, which is a big plus.
Q5. You led the Falcon project – a transactional storage engine for the MySQL server- through the acquisition of MySQL by Sun Microsystems. What impact did it have this project in the database space?
Jim Starkey: In all honesty, I’d have to say that Falcon’s most important contribution was its competition with InnoDB. In the end, that competition made InnoDB three times faster. Falcon, multi-version in memory using the disk for backfill, was interesting, but no matter how we cut it, it was limited by the performance of the machine it ran on. It was fast, but no single node database can be fast enough.
Q6. What are the most challenging issues in databases right now?
Jim Starkey: I think it’s time to step back and reexamine the assumptions that have accreted around database technology – data model, API, access language, data semantics, and implementation architectures. The “relational model”, for example, is based on what Codd called relations and we call tables, but otherwise have nothing to do with his mathematic model. That model, based on set theory, requires automatic duplicate elimination. To the best of my knowledge, nobody ever implemented Codd’s model, but we still have tables which bear a scary resemblance to decks of punch cards. Are they necessary? Or do they just get in the way?
Isn’t it ironic that in 2016 a non-skilled user can find a web page from Google’s untold petabytes of data in millisecond time, but a highly trained SQL expert can’t do the same thing in a relational database one billionth the size?. SQL has no provision for flexible text search, no provision for multi-column, multi-table search, and no mechanics in the APIs to handle the results if it could do them. And this is just one a dozen problems that SQL databases can’t handle. It was a really good technical fit for computers, memory, and disks of the 1980’s, but is it right answer now?
Q7. How do you see the database market evolving?
Jim Starkey: I’m afraid my crystal ball isn’t that good. Blobs, another of my creations, spread throughout the industry in two years. MVCC took 25 years to become ubiquitous. I have a good idea of where I think it should go, but little expectation of how or when it will.
Qx. Anything else you wish to add?
Jim Starkey: Let me say a few things about my current project, AmorphousDB, an implementation of the Amorphous Data Model (meaning, no data model at all). AmorphousDB is my modest effort to question everything database.
The best way to think about Amorphous is to envision a relational database and mentally erase the boxes around the tables so all records free float in the same space – including data and metadata. Then, if you’re uncomfortable, add back a “record type” attribute and associated syntactic sugar, so table-type semantics are available, but optional. Then abandon punch card data semantics and view all data as abstract and subject to search. Eliminate the fourteen different types of numbers and strings, leaving simply numbers and strings, but add useful types like URL’s, email addresses, and money. Index everything unless told not to. Finally, imagine an API that fits on a single sheet of paper (OK, 9 point font, both sides) and an implementation that can span hundreds of nodes. That’s AmorphousDB.
Jim Starkey invented the NuoDB Emergent Architecture, and developed the initial implementation of the product. He founded NuoDB [formerly NimbusDB] in 2008, and retired at the end of 2012, shortly before the NuoDB product launch.
Jim’s career as an entrepreneur, architect, and innovator spans more than three decades of database history from the Datacomputer project on the fledgling ARPAnet to his most recent startup, NuoDB, Inc. Through the period, he has been
responsible for many database innovations from the date data type to the BLOB to multi-version concurrency control (MVCC). Starkey has extensive experience in proprietary and open source software.
Starkey joined Digital Equipment Corporation in 1975, where he created the Datatrieve family of products, the DEC Standard Relational Interface architecture, and the first of the Rdb products, Rdb/ELN. Starkey was also software architect for DEC’s database machine group.
Leaving DEC in 1984, Starkey founded Interbase Software to develop relational database software for the engineering workstation market. Interbase was a technical leader in the database industry producing the first commercial implementations of heterogeneous networking, blobs, triggers, two phase commit, database events, etc. Ashton-Tate acquired Interbase Software in 1991, and was, in turn, acquired by Borland International a few months later. The Interbase database engine was released open source by Borland in 2000 and became the basis for the Firebird open source database project.
In 2000, Starkey founded Netfrastructure, Inc., to build a unified platform for distributable, high quality Web applications. The Netfrastructure platform included a relational database engine, an integrated search engine, an integrated Java virtual machine, and a high performance page generator.
MySQL, AB, acquired Netfrastructure, Inc. in 2006 to be the kernel of a wholly owned transactional storage engine for the MySQL server, later known as Falcon. Starkey led the Falcon project through the acquisition of MySQL by Sun Microsystems.
Jim has a degree in Mathematics from the University of Wisconsin.
For amusement, Jim codes on weekends, while sailing, but not while flying his plane.
– NuoDB Emergent Architecture (.PDF)
– On Database Resilience. Interview with Seth Proctor, ODBMs Industry Watch, March 17, 2015
Related Posts
– Challenges and Opportunities of The Internet of Things. Interview with Steve Cellini, ODBMS Industry Watch, October 7, 2015
– Hands-On with NuoDB and Docker, BY MJ Michaels, NuoDB. ODBMS.org– OCT 27 2015
– How leading Operational DBMSs rank popularity wise? By Michael Waclawiczek– ODBMS.org · JANUARY 27, 2016
– A Glimpse into U-SQL BY Stephen Dillon, Schneider Electric, ODBMS.org-DECEMBER 7, 2015
– Gartner Magic Quadrant for Operational DBMS 2015
Follow us on Twitter: @odbmsorg

MySQL lost “AUTO_INCREMENT” after a long time


I create a empty table and then alter this table’s “AUTO_INCREMENT” starts from 10000000.After a long time ,MySQL lost “AUTO_INCREMENT” definition when I show this table again. MariaDB has the same problem but Percona Server is fine.

Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 141028725
Server version: 5.6.17-log MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> DROP TABLE IF EXISTS `test_bug` ;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CREATE TABLE test_bug (
-> `consult_order_id` BIGINT ,
-> `pay_channel` INT COMMENT ,
-> `price` BIGINT ,
-> `trade_no` VARCHAR (64) ,
-> `right_no` VARCHAR (64) ,
-> `item_id` BIGINT NOT NULL,
-> `source_type` INT NOT NULL ,
Query OK, 0 rows affected (0.01 sec)

mysql> ALTER TABLE test_bug
-> ADD INDEX index_consult_order (consult_order_id) ;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> ALTER TABLE test_bug
-> ADD INDEX index_right_no (right_no) ;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> ALTER TABLE test_bug
-> AUTO_INCREMENT = 100000000 ;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> show create table test_bug;

CREATE TABLE `test_bug` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`consult_order_id` bigint(20) DEFAULT NULL ,
`pay_channel` int(11) DEFAULT NULL COMMENT ,
`price` bigint(20) DEFAULT NULL COMMENT ,
`trade_no` varchar(64) DEFAULT NULL COMMENT ,
`right_no` varchar(64) DEFAULT NULL COMMENT ,
`item_id` bigint(20) NOT NULL COMMENT ,
`source_type` int(11) NOT NULL COMMENT ,
`gmt_create` datetime DEFAULT CURRENT_TIMESTAMP ,
KEY `index_consult_order` (`consult_order_id`),
KEY `index_right_no` (`right_no`)

1 row in set (0.00 sec)

wait about 24 hours , then show create table :

mysql> show create table test_bug;

CREATE TABLE `test_bug` (
`consult_order_id` BIGINT (20) DEFAULT NULL,
`pay_channel` INT (11) DEFAULT NULL,
`trade_no` VARCHAR (64) DEFAULT NULL,
`right_no` VARCHAR (64) DEFAULT NULL,
`item_id` BIGINT (20) NOT NULL,
`source_type` INT (11) NOT NULL,
KEY `index_consult_order` (`consult_order_id`),
KEY `index_right_no` (`right_no`)

1 row in set (0.00 sec)
AUTO_INCREMENT disappears .
Bug has been reported :https://bugs.mysql.com/bug.php?id=77743

Tracking ASM Metrics

Collecting ASM IO Metrics If you are an Oracle DBA, then it is quite likely that Oracle ASM is used as the storage management for at least some of the databases you manage. Eventually you will want to see ASM metrics that can be used to track the performance of the storage over time.  There…

Architecture of data warehouse which is based on MQ

Recently, we create a mysql data warehouse which is based on message queue.
Most companies must prepare for particular queries in their systems if they consider to split their databases or tables into many pieces.
some problems should be solved in this situation:
1. how to get correct results in-time
2. how to build strong data warehouse for future analyst
These policies were used by YHD
They have already deployed a middle-ware layer to support these requests (between web apps and databases). Every aggregation SQL was splited into many small SQLs and runs in every data nodes.The Final result is the aggregation of these all small SQLs. In this procedure, everything was computed in memory to get high performance.
In data warehouse layer, they use self-defined ETL tools to extract data from different databases to oracle-Exadata platform. Log-based data was put into hadoop and hbase.
I found a new solution
With Canal and Roma (visit previous PDF roma system) , we could build a data warehouse which is based on metaQ. (metaQ is the final storage of roma) , so we can put some simple queries on this data warehouse directly.
We could use MySQL to build this Data warehouse and use original replication in these databases (everything is simple, especially using multiple source feature via MariaDB).
Disadvantages of this architecture: MySQL database is not the best choice for data warehouse. So we need another analyst platform to handle other log-based data.
Most BI systems were built by very expensive commercial software . For small and medium sized companies, this architecture can save a lot of costs.
Client to aggregate messages:

split and merge aggregation :

total architecture:

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569