And the results are in:Innodb (no compression/original) – 660GbRocksDB – 209GbTokuDB (snappy) – 144GbTokuDB (LZMA) – 67GbBenchmark performance with mysqlslap on production sample queries :(8-9 Very quick SELECTs + 1-2 medium SELECTs)Innodb (original)BenchmarkAvg: 0.100 secondsMin: 0.091 secondsMax: 0.182 secondsTotal: 5.101sTokuDB (snappy)BenchmarkAvg: 0.100 secondsMin: 0.089 secondsMax: 0.183 secondsTotal: 5.106sRocksDBBenchmarkAvg: 0.113 secondsMin: 0.104 secondsMax: 0.164 secondsTotal: 5.730sTokuDB (LZMA)BenchmarkAvg: 0.099 secondsMin: 0.090 secondsMax: 0.155 secondsTotal: 5.037sTesting Platform: Platform | Linux Release | CentOS release 6.8 (Final) Kernel | 2.6.32-642.11.1.el6.x86_64Architecture | CPU = 64-bit, OS = 64-bit Threading | NPTL 2.12 Compiler | GNU CC version 4.4.7 20120313 (Red Hat 4.4.7-17). SELinux | Disabled Virtualized | VMWare Processors | physical = 2, cores = 4, virtual = 4, hyperthreading = no Speeds | 4×2299.998 Models | 4xIntel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Caches | 4×46080 KB Memory | 15.6GObservations – Load during conversion:TokuDB snappy – Load 1.07, IOPs (around) 30mb/sRocksDB – Load 1.09, IOPs (around) 50-70Mb/s(There seem to be data load round and then a second round of compression afterwards)TokuDB LZMA – Load 3-4, IOPs (around) 7mb/s
Things I look for when optimising or debugging a Linux OS:
IOschedular (noop or deadline)
Linux Kernel > 3.18 (multi queuing)
IRQbalance > 1.0.8
File System: noatime, nobarrier
logfiles in different partition (if p…
Last year, I read ‘The Phoenix Project’. I liked it and as an IT manager in the past, I did experience high blood pressure during the SEV1 scenarios in the book.
I also liked the way DevOps methodology helped solve issues with IT as well as help the company succeed overall.
As a DBA, however, I did have some things that didn’t make sense to me about this story.
Bare in mind that the two major incidents in the book were database related. So in this post, I would like to jot down some things I have noticed and how they could have been solved looking at them from a different lens.
Caution, Spoiler Alert
Incident No.1 – Tokenisation
In the first incident, a 3rd party supplier ran a script against the database to tokenise some personal data. This was related to an issue that information security highlighted, but had the result of effecting HR and accounting.
In the book, there is a complaint that there was no test environment to see if this script would have any negative effects on other parts of the organisation.
Now to me, this does make sense and at the same time, makes no sense at all.
If you meant, that back in the day, it was hard to get full environments setup to test changes on your application servers, then you would probably be right. Today, perhaps based on the methodology that this book introduces, you probably do have those environments setup: either virtualised or in a container.
What doesn’t make sense to me is that is not having a test database. Now reading through the book, there are mentions of Oracle database and some MS SQL databases. As a mainly MySQL DBA, I have not always worked on those databases, but I have worked next to people who have. My observation is, if you were to have an Oracle database, you would almost certainly have other dev/test/UAT/staging/pre-prod database servers as well.
Why do I think this? If you can afford to pay for an Oracle database, you would probably get more testing databases under the same license. License being the most expensive part when using Oracle.
So a testing database to test things that may effect the precious and expensive database server is almost a certainty.
DBA as a Gatekeeper
Now it seems shocking to me that the DBA had not been involved in the process to validate this 3rd party script. Old school Oracle DBAs are involved in everything that happens on their servers.
Need a user on the database? goto the DBA.
Need a database server for a new app? please fill these in triplicates, detailing what would be the projected usage for the next 5 years.
In most companies, an Oracle DBAs may even setup integration between other products like Oracle HR and finance.
So how could you have run something that significant against the database without their knowledge is beyond me.
Assuming that a database field had in fact been messed up, then Oracle DBAs have a TON of really enviable backup and restore features.
They can query a table to view all the backups that are available to restore from and choose the point-in-time that is closest to what they need. A DBA could simply restore the database, fetch the specific table that had its column changed and apply it to the to production database.
Its more than one table? Restore the database, go over the changes in the logs a point-in-time and skip the parts the conversion script applied.
It seems to me that the authors wrote the book based on their own experiences, but those experiences occurred in companies that had no DBAs. Not having a DBA is a product of start ups, not old school 1500-person car-parts manufacturers.
Incident No.2 – Conversion
There was a crippling database issue to do with a database conversion that was needed along side some new code roll out. The issue caused a 2 day – break out the hand held receipt machine – downtime to the system.
Works on My Laptop
During the initial investigation, a developer said something along the lines of ‘it worked fine on my laptop’ when describing the performance of the database conversion scripts. The problem was that on production, it was x1000 slower. Now, I have written about how to not be the one that deploys that slow query to production before and this really states that situation. Apparently, they still didn’t have a database testing environment to test it against.
However, on the topic above of ‘DBA as a gatekeeper’:
Why didn’t the DBA review the conversion scripts or was involved in the the code review process for SQL statements?
It could be that there wasn’t any in the company.
Another point was that they couldn’t cancel the conversion after they started and noticed how slow it was. If this was within a transaction or a single alter table statement, why not?
If too many things have changed, could they not restore the database to a point-in-time before the changes were made?
Was the conversion x1000 slow instead of maybe x10 slow, because of a foreign key check that could have been turned off?
A DBA would have given you those options.
After the hero turns things around and things begin to pickup, they decide to start a separate project to add predictive features to the main project. In it, they decided to bypass seeking permission for database changes and create a new database where they copied production data into it from several locations. I very much like this approach and it falls in line with the reactive micro services pattern.
This would make this book ahead of its time. Instead of managing one main database (although, they did mention in the book that had a couple of dozen database servers) for the website, they can break it up into several database servers, based on functionality. What is required is to use tools – and I would believe in 2012, they meant ETL tools – to migrate the needed data into these new database servers.
This would still need a DBA though or at the very least, a data engineer with an ops background, as you now need to:
Data model new environments based on data from old ones
Create and maintain data pipelines
Monitor for errors and fix data that didn’t make it
Fix data drift and re-sync data across servers
In addition, you now need to backup, monitor the availability and performance of these additional database servers.
So while it adds complexity to the backend and you are now moving from simple database maintenance to a more data architecture role, it is the way forward. Certainly the only way to have proper micro services with their own single-purpose and loosely coupled data stores.
it might have been better if they just hired a DBA to solve thier DB issues.— GuybrushThreepwoodⓋ (@jonathan_ukc) January 6, 2017
;)— Kevin Behr (@kevinbehr) January 8, 2017
When I consult at a company, I aim to identify issues with their database and give options on how to solve them.However, sometimes implementing those solutions may be a more lengthy process than it needs to be and sometimes they may not be implemented at all. During my career, I have observed some reasons as to why that might happen within organizations.
Obviously, the following observations will never happen at your company. I am just writing about them so that you might notice them in other places.
1. Legacy code
People don’t like to have anything to do with legacy code. It’s painful. It’s difficult. It’s risky to change. It runs business critical functions. Worse of all, they didn’t write it. This can be a problem as often, the most cripling database issues require changes to legacy code.2. New Technologies or Methods
People don’t like you to introduce any new technologies they don’t want to learn and maintain. Not even different methods in technologies already being used. No fancy upgrades to the DB server, no new load balancers and certainly don’t start using SQL statements in the code over their existing ORM.3. Old Technologies or Methods
In a complete polar opposite, people in tech organisations don’t like you to introduce boring technologies. What would be the point of introducing boring (yet tested) technologies when they could be playing around with shiny new ones. There is a caveat to this – groups prefer it when other groups they depend on (let’s say developers depend on ops) choose to use boring and tested technologies. Just not for themselves. And vice versa.4. Management Involvement
Last, but certainly not least, no one from upper management will get involved in resolving these issues and push forward solutions. No project/product manager/agile-coach will be assigned to chase up issues. As far as they are concerned, this is an engineering issue and as engineers, you need to sort it out yourselves. Only ‘change requests’ from the business, have managers around it. Final Thoughts
After some years of analysing database systems for performance issues, I am finally realising that I should also analyse human systems for performance issues.
Setting up databases in development environments can be challenging.
Normally, what I usually see is some automated process for setting up empty databases with up-to-date data structures. This is helpful for integration testing, but is tricky for actual development as well as performance testing.
It is difficult to conceptually get your head around writing a query when you cannot see any data in your tables
You cannot possibly know if your query is slow before you deploying it to production without running it against ‘some’ data.
Relevant Post: How to Not be the One that Deploys that Slow Query to Production
In addition, there can be a strict requirement to not let sensitive customer data be available outside certain secure environments and certainly not available to development environments.
What you would need to do is go over your database and separate the elements into different criteria:
Data structure management and user management should be, by now, a solved problem. You have systems like Liquibase and Flyway that manage this for you. Essentially, you can use these systems to automatically generate containers which your developers can then use or setup empty local databases on developer machines using SQL scripts.
For user management, MySQL has PAM plugin to manage users via LDAP, but you can manage this through scripts as well.
Referential tables (tables that contain data such as id = 1, status = ‘DONE’) should also be small enough to be included in this stage as well. You need to identify which tables contain this very basic data and add it to the data structure repository or SQL file.
This is where things get a little bit tricky: You need to identify which tables are your Primary ‘feed data’ tables or Object tables. You then need to identify which tables are the Child tables of those Primary tables. Lastly, you need to identify which tables Map keys across different tables – either Primary to Child (as in multi-to-multi relationships) or Primary to Primary.
Once you have identified these tables, you can discern how much data you would like to keep in your development databases. My recommendation would be to go in these three directions:
Specify a set number of keys in the Primary tables and then get the data from the Child and Mapping tables based on those keys.
Specify a specific set of keys from the Primary tables and then get the data from the Child and Mapping tables based on those keys.
Keep data by a date range for the primary table and then use its keys to populate the Child and Mapping tables.
Make sure that the amount of data is adequate for your needs: not too small and not too large.
This separation of table types can now help us with identifying sensitive data. Data structure and also Referential tables, should not have in them sensitive data. Neither should Mapping tables. What would have sensitive data are Primary and Child tables.
Identify the columns where sensitive data maybe kept in those tables and either:
Empty that data
Give it a default value (all emails will be email@example.com)
Obfuscate those values in some way
You can change this the data by either outputting it with those changes into an SQL file or dumping that data into a staging database, changing the sensitive data and then dumping it into an SQL file with a tool.
Ideally, this stage needs to go through a QA process/person before the company releases sensitive data to generally available containers or repositories which keep history of changes.
With taking the time to separate the different elements in a database, you can make it less complicated and you would then be more able to automate parts of the database into your CI/CD process.
A counter table is a table that keeps counts of particular items or for certain keys. This can range from page count on your blog to keep track of a limit the user is allowed to have from a particular item or service.Usually, a counter table would be better kept in something like Memcached or Redis as frequent increment updates would be better suited to those in-memory systems.MySQL and Innodb in particular has many stringent systems to make sure that your data has been reliably written to disk. Just going through those systems alone, can make having a counter table, not suitable, not even considering the speed it takes to update the actual table.However, sometimes there is a need for certain assurances from failure scenarios where in-memory systems may not be suitable for – as when they crash, the data kept in memory is cleared out.In those cases, may I recommend that you do what I consider a ‘aggregate counter table’. The idea here is to replace doing lots of increment updates and simply count the original base table you are interested in having counts for.In short, instead of:INSERT INTO base_table;UPDATE counter_table set value=value+1 where key=key1;
You would doINSERT INTO base_table;On interval (like 1 to 5 seconds):- INSERT INTO counter_table – SELECT key1, count(1), max(primarykey) FROM base_table – WHERE last_count_position- GROUP BY key1- ON DUPLICATE KEY UPDATE value=value+recent_count
In order to be able to aggregate the base_table more correctly, you need to keep some sort of record of what was the last time or position you read for the base table. What I recommend you consider, is either the primary key, assuming its an integer as well as having a last_updated timestamp column.Below is an example of a counter table that keeps the last id of the primary key it counted from the base table:
CREATE TABLE counter_table ( key_id int(10) unsigned NOT NULL, counts int(10) unsigned DEFAULT ‘0’, lastprimary_id int(10) unsigned DEFAULT ‘0’, PRIMARY KEY (key_id), KEY idx_camp (lastprimary_id)) ENGINE=InnoDB;
In order to run your ‘refresh’ query, you would first need to query the counter_table like this:SELECT max(lastprimary_id) from counter_table;
Then populate the counter table by including in your above INSERT INTO SELECT statement a:WHERE base_table.primarykey > lastprimary_id
This should be very fast and will prevent the many ‘database-attacking update queries’ that can become a serious bottleneck to your performance in the long run.DownsidesThis method doesn’t factor in if the rows in the base table were UPDATE’d or DELETE’d. It just counts the row number. If this is a requirement, you can revert to using UPDATE statements for:UPDATE counter_table SET value=value-1with the understanding that this will happen infrequently.You also, now need to maintain a procedure and monitor that it is running on the set intervals that you need it. Fortunately, MySQL has scheduled Events to help with that.
There is an idea that data is sacred and needs to be stored forever. However, if you keep your data forever, you will, sooner or later, have a very large database.In order to keep operations running smoothly, it would be useful to allocated data that i…
The following query gets data from performance_schema in MySQL and attempts to regex the digest to the list of tables in the same schema. SELECT d.*, (SELECT group_concat(distinct TABLE_NAME) FROM information_schema.TABLES WHERE table_schema = d….
I have been keeping an eye on MariaDB’s Columnar store progress for a bit longer then half a year.MariaDB chose to take the infinidb code after Calpoint closed shop about two years ago and implemented it into their product. I was a bit wary about infinidb as well as it was a columnar store without compression that had mixed reviews on news hacker.However, it seems like MariaDB have pulled it off. They have added the infinidb engine to MariaDB with all its quirks and extra commands and they have added snappy compression as well. This is truely a huge win for them and their users, specifically in the area of reporting and analytics.Here are two real life examples for getting data ready for reporting currently happening in the wild:1) MySQL -> Sqoop -> Hadoop – where you would need a) 5-6 additional servers, b) someone to set those servers up in a hadoop cluster and then c) monitor the daily data transfer.2) MySQL -> CDC -> Kafka -> Hadoop – a) more technologies to master, b) a few more servers and some c) more monitoring. But this time, its streaming.To set all of this up could take from a couple of months to a year.Now with MariaDB, you have:1) MariaDB + some SQL scripts – such as INSERT INTO datawarehouse.facttable SELECT … FROM site.table1 WHERE date >= curdate() – interval 1 day;2) MariaDB -> Pentaho/Talend -> MariaDB – Could be a bit slower, but with a GUI and really a lot of monitoring out of the box.As you can see, there are a lot fewer technologies, a lot fewer complexities and it is a lot more straight forward to develop.It is also very important to add that no one other than MariaDB is doing this. The closest you have is Tokudb which is great and can also last you a while, but a Columnar store is definitely more suited for this type of task.So once again, congratulations to MariaDB for offering such a huge benefit to its users.
You probably heard that MySQL 5.7 supports JSON.But did you know that you can also use JSON with MySQL Stored Procedures – making them very flexible?Less talk, more code:(Data was used from the UK Land Registry that I worked on)