Author: Open Query

MariaDB 10.3 use case: Hidden PRIMARY KEY column for closed/legacy applications

I really like MariaDB 10.3, it has a lot of interesting new features.  Invisible (hidden) columns, for instance.  Here is a practical use case:
You’re dealing with a legacy application, possibly you don’t even have the source code (or modifying is costly), and this app doesn’t have a PRIMARY KEY on each table. Consequences:

Even though InnoDB would have an internal hidden ID in this case (you can’t access that column at all), it affects InnoDB’s locking strategy – I don’t think this aspect is explicitly documented, but we’ve seen table-level locks in this scenario where we’d otherwise see more granular locking;
Galera cluster really wants PKs;
Asynchronous row-based replication can work without PKs, but it is more efficient with;

So, in a nutshell, your choices for performance and scaling are restricted.
On the other hand, you can’t just add a new ID column, because the application may:

use SELECT * and not be able to handle seeing the extra column, and/or
use INSERT without an explicit list of columns and then the server would chuck an error for the missing column.

The old way
Typically, what we used to do for cases like this is hunt for UNIQUE indexes that can be converted to PK. That’d be a clean operation with no effect on the application. We use a query like the following to help us find out if this is feasible:

SELECT
CONCAT(s.TABLE_SCHEMA,’.’,s.TABLE_NAME) AS tblname,
INDEX_NAME AS idxname,
GROUP_CONCAT(s.COLUMN_NAME) AS cols,
GROUP_CONCAT(IF(s.NULLABLE=’YES’,’X’,’-‘)) AS nullable
FROM INFORMATION_SCHEMA.STATISTICS s
WHERE s.table_schema NOT IN (‘information_schema’,’performance_schema’,’mysql’)
AND s.NON_UNIQUE=0
GROUP BY tblname,idxname
HAVING nullable LIKE ‘%X%’
ORDER BY tblname;
The output is like this:

+———-+———+——+———-+
| tblname  | idxname | cols | nullable |
+———-+———+——+———-+
| test.foo | a       | a,b  | -,X      |
+———-+———+——+———-+
We see that table test.foo has a UNIQUE index over columns (a,b), but column b is NULLable.  A PK may not contain any NULLable columns, so it would not be a viable candidate.
All is not yet lost though, we can further check whether the column data actually contains NULLs. If it doesn’t, we could change the column to NOT NULL and thus solve that problem.  But strictly speaking, that’s more risky as we may not know for certain that the application will never use a NULL in that column. So that’s not ideal.
IF all tables without a PK have existing (or possible) UNIQUE indexes without any NULLable columns, we can resolve this issue. But as you can appreciate, that won’t always be the case.
With MariaDB 10.3 and INVISIBLE columns
Now for Plan B (or actually, our new plan A as it’s much nicer).  Let’s take a closer look at the foo table from the above example:

CREATE TABLE foo (
a int(11) NOT NULL,
b int(11) DEFAULT NULL,
UNIQUE KEY uidx (a,b)
) ENGINE=InnoDB
Our new solution:
ALTER TABLE foo ADD COLUMN id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY INVISIBLE FIRST
So we’re adding a AUTO_INCREMENT new column named ‘id’, technically first in the table, but we also flag it as invisible.
Normally, you won’t notice, see how we can use INSERT without an explicit (a,b) column list:
MariaDB [test]> INSERT INTO  foo VALUES (2,24);
Query OK, 1 row affected (0.007 sec)
MariaDB [test]> SELECT * FROM foo;
+—+——+
| a | b    |
+—+——+
| 1 |   20 |
| 2 |   24 |
+—+——+
So it won’t even show up with SELECT *.  Fabulous.  We can see the column only if we explicitly ask for it:

MariaDB [test]> SELECT id,a,b FROM foo;
+—-+—+——+
| id | a |    b |
+—-+—+——+
|  1 | 1 |   20 |
|  2 | 2 |   24 |
+—-+—+——+
We solved our table structural issue by introducing a clean AUTO_INCREMENT id column covered by a PRIMARY KEY, while the application remains perfectly happy with what it can SELECT and INSERT. Total win!

RDS Aurora MySQL Cost

I promised to do a pricing post on the Amazon RDS Aurora MySQL pricing, so here we go.  All pricing is noted in USD (we’ll explain why)
We compared pricing of equivalent EC2+EBS server instances, and verified our calculation model with Amazon’s own calculator and examples.  We use the pricing for Australia (Sydney data centre). Following are the relevant Amazon pricing pages from which we took the pricing numbers, formulae, and calculation examples:

Amazon EC pricing (on demand)
Amazon EBS pricing

Amazon RDS Aurora pricing (on demand)
Amazon AWS calculator tool

Base Pricing Details

Specs
 
 
 
 
EC2
 
 
RDS Aurora MySQL
 

instance type
vCPU
ECU
GB RAM
 
Storage
Linux/hr
 
instance type
Price/hr

r4.large
2
7
15.25

EBS Only
$0.160

db.r4.large
$0.350

r4.xlarge
4
13.5
30.5

EBS Only
$0.319

db.r4.xlarge
$0.700

r4.2xlarge
8
27
61

EBS Only
$0.638

db.r4.2xlarge
$1.400

r4.4xlarge
16
53
122

EBS Only
$1.277

db.r4.4xlarge
$2.800

r4.8xlarge
32
99
244

EBS Only
$2.554

db.r4.8xlarge
$5.600

r4.16xlarge
64
195
488

EBS Only
$5.107

db.r4.16xlarge
$11.200

That’s not all we need, because both EBS and Aurora have some additional costs we need to factor in.
EBS pricing components (EBS Provisioned IOPS SSD (io1) volume)
“Volume storage for EBS Provisioned IOPS SSD (io1) volumes is charged by the amount you provision in GB per month until you release the storage. With Provisioned IOPS SSD (io1) volumes, you are also charged by the amount you provision in IOPS (input/output operations per second) per month. Provisioned storage and provisioned IOPS for io1 volumes will be billed in per-second increments, with a 60 second minimum.”

Storage Rate $0.138 /GB/month of provisioned storage“For example, let’s say that you provision a 2000 GB volume for 12 hours (43,200 seconds) in a 30 day month. In a region that charges $0.125 per GB-month, you would be charged $4.167 for the volume ($0.125 per GB-month * 2000 GB * 43,200 seconds / (86,400 seconds/day * 30 day-month)).”

I/O Rate $0.072 /provisioned IOPS-month“Additionally, you provision 1000 IOPS for your volume. In a region that charges $0.065 per provisioned IOPS-month, you would be charged $1.083 for the IOPS that you provisioned ($0.065 per provisioned IOPS-month * 1000 IOPS provisioned * 43,200 seconds /(86,400 seconds /day * 30 day-month)).”

Other Aurora pricing components

Storage Rate $0.110 /GB/month(No price calculation examples given for Aurora storage and I/O)

I/O Rate $0.220 /1 million requests(Presuming IOPS equivalence / Aurora ratio noted from arch talk)

So this provides us with a common base, instance types that are equivalent between Aurora and EC2.  All other Aurora instances types are different, so it’s not possible to do a direct comparison in those cases.  Presumably we can make the assumption that the pricing ratio will similar for equivalent specs.
On Demand vs Reserved Instances
We realise we’re calculating on the basis of On Demand pricing.  But we’re comparing pricing within AWS space, so presumably the savings for Reserved Instances are in a similar ballpark.
Other factors

We have 720 hours in a 30 day month, which is 2592000 seconds.
70% read/write ratio – 70% reads (used to calculate the effective Aurora IOPS)
10% read cache miss -10% cache miss rate on reads
Aurora I/O ratio: 3 (Aurora requiring 2 IOPS for a commit vs 6 in MySQL – even though this is a pile of extreme hogwash in terms of that pessimistic MySQL baseline)

We also spotted this note regarding cross-AZ Aurora traffic:
“Amazon RDS DB Instances inside VPC: For data transferred between an Amazon EC2 instance and Amazon RDS DB Instance in different Availability Zones of the same Region, Amazon EC2 Regional Data Transfer charges apply on both sides of transfer.”
So this would apply to application DB queries issued across an AZ boundary, which would commonly happen during failover scenarios.  In fact, we know that this happens during regular operations with some EC2 setups, because the loadbalancing already goes cross-AZ.  So that costs extra also.  Now you know!  (note: we did not factor this in to our calculations.)
Calculation Divergence
Our model comes up with identical outcomes for the examples Amazon provided, however it comes up 10-15% lower than Amazon’s calculator for specific Aurora configurations.  We presume that the difference may lie in the calculated Aurora I/O rate, as that’s the only real “unknown” in the model.  Amazon’s calculator does not show what formulae it uses for the sub-components, nor sub-totals, and we didn’t bother to tweak until we got at the same result.
It’s curious though, as the the architecture talk makes specific claims about Aurora’s I/O efficiency (which presume optimal Aurora situation and a dismal MySQL reference setup, something which I already raised in our initial Aurora post).  So apparently the Amazon calculator assumes worse I/O performance than the technical architecture talk!
Anyhow, let’s just say our costing is conservative, as the actual cost is higher on the Aurora end.
Scenarios
Here we compare with say a MySQL/MariaDB Galera setup across 3 AZs running on EC2+EBS.  While this should be similar in overall availability and read-capacity, note that

you can write to all nodes in a Galera cluster, whereas Aurora currently has a single writer/master;
Galera doesn’t require failover changes as all its nodes are technically writers anyhow, whereas Aurora failover causes a cluster outage of at least 30 seconds.

Servers
R/Zones
Instance
GB DB
I/O rate
 
 
EC2
EBS

 
 
Aurora
 
 
 

 
 
 
 
 
Read IOPS
 
Instances
Storage
I/O
EC2 Total
 
Instances
Storage
I/O
Aurora Total

3
3
r4.xlarge
250
2,000
740

$689
$104
$160
$952
 
$1,512
$83
$141
$1,735

6
3
r4.xlarge
250
2,000
740

$1,378
$207
$320
$1,905
 
$3,024
$83
$141
$3,247

 

 
 

When using the Amazon calculator, Aurora comes out at about double the EC2.  But don’t take our word for it, do try this for yourself.
Currency Consequences
While pricing figures are distinct per country that Amazon operates in, the charges are always in USD.  So this means that the indicated pricing is, in the end, in USD, and thus subject to currency fluctuations (if your default currency is not USD).  What does this mean?
USD-AUD rate chart 2008-2018, from xe.comSo USD 1,000 can cost as little as AUD 906 or as much as AUD 1,653, at different times over the last 10 years.  That’s quite a range!
Conclusion
As shown above, our calculation with Aurora MySQL shows it costing about twice as much.  This is based on a reference MySQL/MariaDB+Galera with roughly the same scaling and resilience profile (e.g. the ability to survive DC outages).  In functional terms, particularly with Aurora’s 30+second outage profile during failover, Galera comes out on top at half the cost.
So when is Aurora cheaper, as claimed by Amazon?
Amazon makes claims in the realm of “1/10th the cost”. Well, that may well be the case when comparing with the TCO of Oracle or MS SQL Server, and it’s fairly typical when comparing a proprietary system with an Open Source based one (mind again that Aurora is not actually Open Source as Amazon does not make their source code available, but it’s based on MySQL).
The only other way we see is to seriously compromise on the availability (resilience).  In our second sample calculation, we use 2 instances per AZ.  This is not primarily for performance, but so that application servers in an AZ don’t have to do cross-DC queries when one instance fails.  In the case of Aurora, spinning up a new instance on the same dataset requires 15 minutes.  So, do you want to take that hit?  If so, you can save money there.  If not, it’s still costly.
But hang on, if you’re willing to make the compromise on availability, you could reduce the Galera setup also, to only one instance per AZ.  Yep!
So, no matter how you tweak it, Aurora is about twice the cost, with (in our opinion) a less interesting failover profile.
The Price of RDS Convenience
What you get with RDS/Aurora is the promise of convenience, and that’s what you pay for.  But, mind that our comparison worked all within AWS space anyway, the EC2 instances we used for MySQL/MariaDB+Galera already use the same basic infrastructure, dashboard and management API as well.  So you pay double just to go to RDS/Aurora, relative to building on EC2.
To us, that cost seems high.  If you spend some, or even all that money on engineering that convenience around your particular setup, and even outsource that task and its maintenance, you get a nicer setup at the same or a lower cost.  And last but not least, that cost will be more predictable – most likely the extra work will be charged in your own currency, too.
Cost Predictability and Budget
You can do a reasonable ball-park calculation of AWS EC2 instances that are always active, but EBS already has some I/O charges which make the actual cost rather more variable, and Aurora adds a few more variables on top of that.  I’m still amazed that companies go for this, even though they traditionally prefer a known fixed cost (even if higher) over a variable cost.  Choosing the variable cost breaks with some fundamental business rules, for the sake of some convenience.
The advantage of known fixed costs is that you can budget properly, as well as project future costs based on growth and other business factors.  Purposefully ditching that realm, while exposing yourself to currency fluctuations at the same time, seems most curious.  How do companies work this into their budgets?  Because others do so?  Well, following the neighbours is not always a good idea.  In this case, it might be costly as well as financially risky.

RDS Aurora MySQL and Service Interruptions

In Amazon space, any EC2 or Service instance can “disappear” at any time.  Depending on which service is affected, the service will be automatically restarted.  In EC2 you can choose whether an interrupted instance will be restarted, or left shutdown.
For an Aurora instance, an interrupted instance is always restarted. Makes sense.
The restart timing, and other consequences during the process, are noted in our post on Aurora Failovers.
Aurora Testing Limitations
As mentioned earlier, we love testing “uncontrolled” failovers.  That is, we want to be able to pull any plug on any service, and see that the environment as a whole continues to do its job.  We can’t do that with Aurora, because we can’t control the essentials:

power button;
reset switch;
ability to kill processes on a server;
and the ability to change firewall settings.

In Aurora, an instance is either running, or will (again) be running shortly.  So that we know.  Aurora MySQL also offers some commands that simulate various failure scenarios, but since they are built-in we can presume that those scenarios are both very well tested, as well as covered by the automation around the environment.  Those clearly defined cases are exactly the situations we’re not interested in.
What if, for instance, a server accepts new connections but is otherwise unresponsive?  We’ve seen MySQL do this on occasion.  Does Aurora catch this?  We don’t know and  we have no way of testing that, or many other possible problem scenarios.  That irks.
The Need to Know
If an automated system is able to catch a situation, that’s great.  But if your environment can end up in a state such as described above and the automated systems don’t catch and handle it, you could be dead in the water for an undefined amount of time.  If you have scripts to catch cases such as these, but the automated systems catch them as well, you want to be sure that you don’t trigger “double failovers” or otherwise interfere with a failover-in-progress.  So either way, you need to know and and be aware whether a situation is caught and handled, and be able to test specific scenarios.
In summary: when you know the facts, then you can assess the risk in relation to your particular needs, and mitigate where and as desired.
A corporate guarantee of “everything is handled and it’ll be fine” (or as we say in Australia “She’ll be right, mate!“) is wholly unsatisfactory for this type of risk analysis and mitigation exercise.  Guarantees and promises, and even legal documents, don’t keep environments online.  Consequently, promises and legalities don’t keep a company alive.
So what does?  In this case, engineers.  But to be able to do their job, engineers need to know what parameters they’re working with, and have the ability to test any unknowns.  Unfortunately Aurora is, also in this respect, a black box.  You have to trust, and can’t comprehensively verify.  Sigh.

RDS Aurora MySQL Failover

Right now Aurora only allows a single master, with up to 15 read-only replicas.
Master/Replica Failover
We love testing failure scenarios, however our options for such tests with Aurora are limited (we might get back to that later).  Anyhow, we told the system, through the RDS dashboard, to do a failover. These were our observations:
Role Change Method
Both master and replica instances are actually restarted (the MySQL uptime resets to 0).
This is quite unusual these days, we can do a fully controlled roll change in classic async replication without a restart (CHANGE MASTER TO …), and Galera doesn’t have read/write roles as such (all instances are technically writers).
Failover Timing
Failover between running instances takes about 30 seconds.  This is in line with information provided in the Aurora FAQ.
Failover where a new instance needs to be spun up takes 15 minutes according to the FAQ (similar to creating a new instance from the dash).
Instance Availability
During a failover operation, we observed that all connections to the (old) master, and the replica that is going to be promoted, are first dropped, then refused (the refused will be during the period that the mysqld process is restarting).
According to the FAQ, reads to all replicas are interrupted during failover.  Don’t know why.
Aurora can deliver a DNS CNAME for your writer instance. In a controlled environment like Amazon, with guaranteed short TTL, this should work ok and be updated within the 30 seconds that the shortest possible failover scenario takes.  We didn’t test with the CNAME directly as we explicitly wanted to observe the “raw” failover time of the instances themselves.
Caching State
On the promoted replica, the buffer pool is saved and loaded (warmed up) on the restart; good!  Note that this is not special, it’s desired and expected to happen: MySQL and MariaDB have had InnoDB buffer pool save/restore for years.
On the old master (new replica/slave), the buffer pool is left cold (empty).  Don’t know why.
Because of the server restart, other caches are of course cleared also.  I’m not too fussed about the query cache (although, deprecated as it is, it’s currently still commonly used), but losing connections is a nuisance.
Statistics
Because of the instance restarts, the running statistics (SHOW GLOBAL STATUS) are all reset to 0. This is annoying, but should not affect proper external stats gathering, other than for uptime.
On any replica, SHOW ENGINE INNODB STATUS comes up empty. Always.  This seems like obscurity to me, I don’t see a technical reason to not show it.  I suppose that with a replica being purely read-only, most running info is already available through SHOW GLOBAL STATUS LIKE ‘innodb%’, and you won’t get deadlocks on a read-only slave.
Multi-Master
Aurora MySQL multi-master was announced at Amazon re:Invent 2017, and appears to currently be in restricted beta test.  No date has been announced for general availability.
We’ll have to review it when it’s available, and see how it works in practice.
Conclusions
I don’t understand why the old master gets a cold InnoDB buffer pool.
I wouldn’t think a complete server restart is necessary, but since we don’t have insight in the internals, who knows.
Losing connections across the cluster is a real nuisance that really impacts applications.  Here’s why.  When a C client (on which most MySQL APIs are based, or modelled) is disconnected, it passes back a specific error to the application.  When the application makes its next query call, the C client will automatically reconnect first (so the client does not have to explicitly reconnect).  However, this does mean that the application has to handle disconnects gracefully without chucking hissy-fits at users, and I know for a fact that that’s not how many (most?) applications are written.  Consequentially, an Aurora failover will make the frontend of an application look like a disaster zone for about 30 seconds.  I appreciate that this is not Aurora’s fault, it’s sloppy application development that causes this, but it’s a real-world fact we have to deal with, and our other cluster and replication options do not trigger this scenario.

TEXT and VARCHAR inefficiencies in your db schema

The TEXT and VARCHAR definitions in many db schemas are based on old information – that is, they appear to be presuming restrictions and behaviour from MySQL versions long ago. This has consequences for performance. To us, use of for instance VARCHAR(255) is a key indicator for this. Yep, an anti-pattern.
VARCHAR
In MySQL 4.0, VARCHAR used to be restricted to 255 max. In MySQL 4.1 character sets such as UTF8 were introduced and MySQL 5.1 supports VARCHARs up to 64K-1 in byte length. Thus, any occurrence of VARCHAR(255) indicates some old style logic that needs to be reviewed.
Why not just set the maximum length possible? Well…
A VARCHAR is subject to the character set it’s in, for UTF8 this means either 3 or 4 (utf8mb4) bytes per character can be used. So if one specifies VARCHAR(50) CHARSET utf8mb4, the actual byte length of the stored string can be up to 200 bytes. In stored row format, MySQL uses 1 byte for VARCHAR length when possible (depending on the column definition), and up to 2 bytes if necessary. So, specifying VARCHAR(255) unnecessarily means that the server has to use a 2 byte length in the stored row.
This may be viewed as nitpicking, however storage efficiency affects the number of rows that can fit on a data page and thus the amount of I/O required to manage a certain amount of rows. It all adds up, so having little unnecessary inefficiencies will cost – particularly for larger sites.
VARCHAR best practice
Best practice is to set VARCHAR to the maximum necessary, not the maximum possible – otherwise, as per the above, the maximum possible is about 16000 for utf8mb4, not 255 – and nobody would propose setting it to 16000, would they? But it’s not much different, in stored row space a VARCHAR(255) requires a 2 byte length indicator just like VARCHAR(16000) would.
So please review VARCHAR columns and set their definition to the maximum actually necessary, this is very unlikely to come out as 255. If 255, why not 300? Or rather 200? Or 60? Setting a proper number indicates that thought and data analysis has gone into the design. 255 looks sloppy.
TEXT
TEXT (and LONGTEXT) columns are handled different in MySQL/MariaDB. First, a recap of some facts related to TEXT columns.
The db server often needs to create a temporary table while processing a query. MEMORY tables cannot contain TEXT type columns, thus the temporary table created will be a disk-based one. Admittedly this will likely remain in the disk cache and never actually touch a disk, however it goes through file I/O functions and thus causes overhead – unnecessarily. Queries will be slower.
InnoDB can store a TEXT column on a separate page, and only retrieve it when necessary (this also means that using SELECT * is needlessly inefficient – it’s almost always better to specify only the columns that are required – this also makes code maintenance easier: you can scan the source code for referenced column names and actually find all relevant code and queries).
TEXT best practice
A TEXT column can contain up to 64k-1 in byte length (4G for LONGTEXT). So essentially a TEXT column can store the same amount of data as a VARCHAR column (since MySQL 5.0), and we know that VARCHAR offers us benefits in terms of server behaviour. Thus, any instance of TEXT should be carefully reviewed and generally the outcome is to change to an appropriate VARCHAR.
Using LONGTEXT is ok, if necessary. If the amount of data is not going to exceed say 16KB character length, using LONGTEXT is not warranted and again VARCHAR (not TEXT) is the most suitable column type.
Summary
Particularly when combined with the best practice of not using SELECT *, using appropriately defined VARCHAR columns (rather than VARCHAR(255) or TEXT) can have a measurable and even significant performance impact on application environments.
Applications don’t need to care, so the db definition can be altered without any application impact.
It is a worthwhile effort.

Contributing to popular frameworks for scalability

Right now we’re volunteering some engineering time to assisting the WordPress and WooCommerce people with scalability issues. In the past we’ve put similar efforts into Drupal.
There are many opinions on these systems out there, most of them sadly negative. We take a different view. Each of these frameworks obviously has their advantages and disadvantages, but the key question is why people use them. When we understand that, we can assess that reasoning, and assist further.
Obviously writing your own code all the way is going to potentially create the most optimal result for your site. A custom tool is going to be less code and more optimal for your situation. However, it also requires you to put in quite a bit of development effort both to create and to maintain that system, including security issues. When you’re big enough (as a site/company) this can be worthwhile, but more and more organisations actually appear to be moving towards utilising a CMS and then extending that to their needs using the usual module/plugin/extension model.
This also means that bigger and bigger sites use these systems, and thus we need to look at the scalability. CMS frameworks tend to be “db heavy”, caused by being highly configurable combined with a modular/abstraction architecture that sometimes goes for “code architectural correctness” without taking scaling into account. Naturally most such code works fine on a dev box and even on a modest live server. But add sufficient content, and things rapidly go downhill. Just like with other database related tuning issues, there is no gradual degradation in performance – when certain boundaries are reached, performance plummets to cause nasty page load times or even complete page load failures.
Are these systems inherently dreadful? Actually, no. They do a darn good job and even in security terms they’re not bad at all. Naturally one always has to be careful with modules/plugins and whether they are (still) maintained.
So, with that assessment out of the way – we can look at the actual issues. It makes sense for us to contribute to these systems as it directly benefits our clients, ourselves, and many many others.
 
Just no the option structure in WordPress has drawn our interest. It’s abstracted, and so a plugin will request the value of an individual option item (by name). Typically it causes a db query. WordPress has an ‘autoload’ mechanism particularly for its core settings, which it loads in one bigger query and caches – that makes sense.
We’ve just commented on an existing issue regarding the indexing of the autoload column, with people arguing that indexing a boolean field is not beneficial (not necessarily true and easily tested for a specific case – the outcomes is that in this case an index IS beneficial) and that having more indexes slows down writes (true of course, but that’s never an argument against proper indexing – also this is mostly a read table, so a bit of overhead on writes is fairly immaterial). Lastly there were comments re MyISAM which has a slightly different performance profile to InnoDB when testing this. But InnoDB has been the default engine for quite a few years now – any installation in recent years, and any installation where the owner needs to care for more performance and other factors, will already be using InnoDB. Taking MyISAM into account is not beneficial. We hope the index will shortly be added. In the mean time you can add it yourself:
ALTER TABLE wp_options ADD INDEX (autoload)
Using autoload for everything would not be efficient, there can be tens of thousands of options in a decently sized site. So we see plugins retrieve half a dozen options, one by one using the appropriate framework function, and each triggers a database query. Again we have to appreciate why this architecture is the way it is, and not waste time on arguing with that. There are sensible reasons for having such an option architecture. What we need to address is the inefficiency of triggering many queries.
Our proposal is to extend the get_option() function with an extra parameter for a plugin name (call it a domain), then the function can issue a single db query to retrieve all the options for that plugin and cache it in a single object. When another option is requested it can be retrieved from there rather than triggering another query. This reduces the number of db queries from N per plugin to 1, which is significant. Mind you, each of these queries is tiny and doesn’t require much lookup time. But running any query has intrinsic overhead so we want to reduce the number of queries whenever possible – not running superfluous queries, combining queries, and so on.
There are variations on the proposal as there are many aspects to consider, but it’s important to not leave it “as is” as currently it affects performance and scalability.
So how do we catch potential items of interest? On the database end, the slow query log is still very useful with the proper settings. log_queries_not_using indexes, min_examined_row_limit=1000 (tune appropriate for the environment), log_slow_filter and log_slow_verbosity to give as much info as possible, and long_query_time to something sub-second. Naturally, this presumes you’ve already taken care of real nasties otherwise you’ll get very big slow log files very quickly. You can analyse the slow query log using Percona Tools, but the ‘mysqldumpslow’ tool which is part of the server package is pretty useful also. You’re looking for queries that either take a long time, or that happen often, or just look odd/inefficient/unnecessary. Then hunt where they come from in the code, and figure out the why. Often reasons aren’t necessarily good or current (historical scenarios linger), but there will be reasons and understanding them will help you talk with developers more effectively – seeing where they come from.
WordPress has a nice plugin called Query Monitor. As admin you get details from a page load in your top admin bar, and you can delve into specific aspects. Things to look out for are an excessive number of queries for a page (a hundred is sadly common, but when you see into the thousands it’s worthwhile investigating – of course, start with the biggest problems first and work your way down), duplicated queries (indicating framework inefficiencies) and things like that. Have a good browse around, there’s much there that’s useful.
If you want, feel free to bring suspected issues to our attention. We’ll look at them, perhaps write about them here, and in any case follow up with the developers in their bug tracking system. It’s worthwhile.

Web Security: SHA1 SSL Deprecated

You may not be aware that the mechanism used to fingerprint the SSL certificates that  keep your access to websites encrypted and secure is changing. The old method, known as SHA1 is being deprecated – meaning it will no longer be supported. As per January 2016 various vendors will no longer support creating certificates with SHA1, and browsers show warnings when they encounter an old SHA1 certificate. Per January 2017 browsers will reject old certificates.
The new signing method, known as SHA2, has been available for some time. Users have had a choice of signing methods up until now, but there are still many sites using old certificates out there. You may want to check the security on any SSL websites you own or run!
To ensure your users’ security and privacy, force https across your entire website, not just e-commerce or other sections. You may have noticed this move on major websites over the last few years.
For more information on the change from SHA1 to SHA2 you can read:

https://konklone.com/post/why-google-is-hurrying-the-web-to-kill-sha-1

To test if your website is using a SHA1 or SHA2 certificate you can use one of the following tools:

https://www.sha2sslchecker.com/
https://shaaaaaaaaaaaaa.com/

Open Query also offers a Security Review package, in which we check on a broad range of issues in your system’s front-end and back-end and provide you with an assessment and recommendations. This is most useful if you are looking at a form of security certification.

Motivation to Migrate RDBMS

http://www.itnews.com/article/3004953/use-oracles-database-watch-out-for-this-dec-1-deadline.html

Companies that use a standard edition of Oracle’s database software should be aware that a rapidly approaching deadline could mean increased licensing costs.

Speaking from experience (at both MySQL AB and Open Query), typically, licensing/pricing changes such as these act as a motivator for migrations.
Migrations are a nuisance (doesn’t matter from/to what platform) and are best avoided as they’re intrinsically painful, costly and time-consuming. Smart companies know this.
When asked in generic terms, we generally recommend against migrations (even to MySQL/MariaDB) for the above-mentioned practical and business reasons. There are also technical reasons. I’ll list a few:

application, query and schema design tends to be most tuned to a particular RDBMS, usually the one the main developer(s) are familiar with. Features are used in a certain way, and the original target platform (even if non deliberate) is likely to execute most efficiently;
RDBMS choice drives hardware/network architecture. A migration should also include a re-think of this, to make optimal use of the database platform;
it’s quite rare (but not unheard of!) for an application to perform better on another platform, without putting a lot of extra work in. If extra work is on the table, then the original DB platform should also be considered as a valid option;
related to other points: a desire to migrate might be based on employees’ expertise with a particular platform rather than this particular application’s intrinsic suitability to that platform. While that can be a valid reason, it should be recognised as the actual reason as there are obviously cost/effort implications in terms of migration cost and other options such as training can be considered.

Nevertheless, a company that’s really annoyed by a vendor’s attitude can opt for the migration route, as they may decide it’s the path of less pain (and lower cost) in the long(er) term.
We do occasionally guide and assist with migrations, if after review it looks like a viable and sensible direction to take.

Slow Query Log Rotation

Some time ago, Peter Boros at Percona wrote this post: Rotating MySQL slow logs safely. It contains good info, such as that one should use the rename method for rotation (rather than copytruncate), and then connect to mysqld and issue a FLUSH LOGS (rather than send a SIGHUP signal).
So far so good. What I do not agree with is the additional construct to prevent slow queries from being written during log rotation. The author’s rationale is that if too many items get written while the rotation is in process, this can block threads. I understand this, but let’s review what actually happens.
Indeed, if one were to do lots of writes to the slow query log in a short space of time, a write could block while waiting.
Is the risk of this occurring greater during a logrotate operation? I doubt it. A FLUSH LOGS has to close and open the file. While there is no file open, no writes can occur anyhow and they may be stored in the internal buffer of the lowlevel MySQL code for this.
In any case, if there is such a high write rate, that is an issue in itself: it is not useful to have the slow query log write that fast. Instead, you’d up the long_query_time and min_examined_rows variables to reduce the effectively “flow rate”. It’s always best to resolve an underlying issue rather than its symptom(s).

Using Persistent Memory in RDBMS

People at Intel started the pmem library project some time ago, it’s open to the broader community at GitHub and  other developers, including Linux kernel devs, are actively involved.
While the library does allow interaction with an SSD using a good-old-filesystem, we know that addressing SSD through SATA or SAS is very inefficient. That said, the type of storage architecture that SSD uses does require significant management for write levelling and verifying so that the device as a whole actually lasts, and your data is kept safe: in theory you could write to an NVRAM chip, and not know when it didn’t actually store your data properly.
But there are other technologies, such as Memristor (RRAM) and Phase Change Memory (PCM, PRAM). Numonyx (founded by Intel and others, since acquired by Micron) was one of the companies developing PCM some years ago, to the point of some commercial applications. Somewhat oddly (in my opinion), Micron ditched their PCM line in 2014 focusing more on 3D NAND technology. In 2015, Intel and Micron announced that they were working on something called 3D XPoint but Micron denies that it’s based on PCM.
I like the concept of PCM because it has a lot of advantages over NAND technology. It’s very stable, doesn’t “bleed” to adjacent memory cells, if it writes correctly it’s stored correctly, and it’s fast. Not as fast as ordinary RAM, but it’s persistent! What I’ve been holding out for is just a small amount of PCM or similar storage in computers, phones, tablets and e-book readers.
In small mobile devices the advantage would be vastly reduced power consumption. ARM processors are able to put entire sections of the processor in standby to save power, but RAM needs to be powered and refreshed regularly. So with persistent memory, a device could maintain state while using hardly any power.
For RDBMS such as MySQL and MariaDB, persistent memory could be used for the InnoDB log files and other relatively small state information that needs to be persistently kept. So this storage would behave likely memory and be addressed as such (pmem uses mmap), but be persistent. So you could commit a transaction, your fsync is very quick, and the transactional information has been stored in a durable fashion. Very shiny, right?
It doesn’t need to be large, something like 512MB would be ample for RDBMS, and possibly much less for mobile devices.
I still reckon persistent memory space has huge potential – and I mention the mobile devices because that’s obviously a larger market. Previously Micron did work with Nokia on using NVM in phones, but as we all know Nokia was acquired and the Micron focus changed. I find the current state of it all quite disappointing, but I do hope the various players in this field will soon focus on this again properly and get the tech out there to be used!
If you happen to know of any current developments and activities, I’d like to hear about it!

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569