Almost every web application has a sensitive data: passwords, SNN, credit cards, emails, etc. Splitting the database to secure and “public” parts allows for restricting user and application parts access to sensitive data.
This is based on MySQL encryption functions or on client-side encryption when the authorized user knows a secret, but encrypted data is distributed to all slaves.
- If possible, use hashes with a big enough salt, and do not store real sensitive data in the database. A good example is passwords. An end-user sends the login and password, application/SQL code calculates the hash with a salt value unique for each end-user and compares the hash with the value stored in the database. Even if the attacker gets the hashes, it’s still hard or even impossible to extract real passwords for all users. Make sure that you are using a good random number generator for the salt, application-side secret, and a good hash function (not MD5).
- Encryption is not suitable if you are going to provide public access to your database (via slave dumps in sql/csv/xml/json format).
- Encryption is a complex topic. Check here for a good blog post explaining hashing usage, and try to find a security consultant if you are inventing some “new” method of storing and encrypting data.
Field encryption example
I’m using a single server setup, because the most important part of data separation should be done on the application side. The secure part of the application has a secret passphrase. For example, you can place the code working with authentication, full profile and payments on a separate server and use a dedicated MySQL account.
create database encrypted; use encrypted; create table t(c1 int, c2 varchar(255), rnd_pad varbinary(16), primary key(c1)); SET block_encryption_mode = 'aes-256-cbc'; SET @key_str = SHA2('My secret passphrase',512); SET @init_vector = RANDOM_BYTES(16); insert into t (c1,c2, rnd_pad) values (1, AES_ENCRYPT('Secret', @key_str, @init_vector), @init_vector); -- decrypt data select c1, AES_DECRYPT(c2,@key_str, rnd_pad) from t;
- GOOD: Master and slave servers have exactly the same data and no problems with replication.
- GOOD: Even if two different end-users have exactly the same password, the stored values are different due to random bytes in the init vector for AES encryption.
- GOOD: Both the encryption and random number generation uses an external library (openssl).
- CONF: It’s important to have binlog_format=ROW to avoid sending the secret to slave servers.
- CONF: Do not allow end-users to change data without changing the init_vector, especially for small strings without random padding. Each update should cause init_vector re-generation.
- BAD: Encrypted data is still sent to slave servers. If the encryption algorithm or protocol is broken, it is possible to get access to data from an insecure part of the application.
- BAD: The described protocol still could be insecure.
Both could cause replication breakage. Replication filters were created for STATEMENT-based replication and are problematic with modern binlog_format=ROW + gtid_mode=on setup. You can find several cases related to database-level slave-side filters in this blog post. If you still need slave-side filtering, use per-table replicate-wild-*-table options.
Even if binary logging is disabled for a specific database, the statement still could be stored in the binary log if it’s a DDL statement, or if the binlog_format is STATEMENT or MIXED and default database is not used by the statement. For details, see the reference manual for the binlog-do-db option. In order to avoid replication issues, you should use ROW-based replication and run SET SESSION sql_log_bin=0; before each DDL statement is executed against the ignored database. It’s not a good idea to use binlog-do-db, because you are losing control of what should be replicated.
Why is binary log filtering useful? Changing the sql_log_bin variable is prohibited inside transactions. The sql_log_bin is DANGEROUS, please do not use it instead of binlog-ignore-db in production on the application side. If you need it for database administration, make sure that you are always typing the “session” word before sql_log_bin. This makes problematic consistent updates of multiple entities inside database.
We still should have the ability to hide just one column from the table. But if we are ignoring the database, we should provide a method of reading non-secure data on slaves / by restricted MySQL accounts. This is possible with triggers and views:
create database test; set session sql_log_bin=0; create table test.t(c1 int, c2 int, primary key(c1)); alter table test.t add primary key(c1); set session sql_log_bin=1; create database test_insecure; create table test_insecure.t(c1 int, c2 int default NULL, primary key(c1)); use test delimiter // create trigger t_aft_ins after insert on test.t FOR EACH ROW BEGIN INSERT test_insecure.t (c1) values (NEW.c1); END // create trigger t_aft_upd after update on test.t FOR EACH ROW BEGIN UPDATE test_insecure.t SET c1 = NEW.c1 WHERE c1 = OLD.c1; END // create trigger t_aft_del after delete on test.t FOR EACH ROW BEGIN DELETE FROM test_insecure.t WHERE c1 = OLD.c1; END // delimiter ; -- just on slave: create database test; create view test.t as select * from test_insecure.t; -- typical usage INSERT INTO test.t values(1,1234); SELECT * from test.t; -- works on both master and slave, c2 field will have NULL value on slave.
- BAD: The data is not the same on the master and slaves. It potentially breaks replication. It’s not possible to use a slave’s backup to restore the master or promote the slave as a new master.
- BAD: Triggers could reduce DML statement performance.
- GOOD: The sensitive data is not sent to slaves at all (and not written to binary log).
- GOOD: It works with GTID
- GOOD: It requires no application changes (or almost no application changes).
- GOOD: binlog-ignore-db allows us to not use the dangerous sql_log_bin variable after initial table creation.