Free, Fast MongoDB Hot Backup with Percona Server for MongoDB

MongoDB Hot Backups

In this blog post, we will discuss the MongoDB Hot Backup feature in Percona Server for MongoDB and how it can help you get a safe backup of your data with minimal impact.

Percona Server for MongoDB

Percona Server for MongoDB is Percona’s open-source fork of MongoDB, aimed at having 100% feature compatibility (much like our MySQL fork).MongoDB Hot Backup We have added a few extra features to our fork for free that are only available with MongoDB Enterprise binaries for an additional fee.

The feature pertinent to this article is our free, open-source Hot Backup feature for WiredTiger and RocksDB, only available in Percona Server for MongoDB.

Essentially, this Hot Backup feature adds a MongoDB server command that creates a full binary backup of your data set to a new directory with no locking or impact to the database, aside from some increased resource usage due to copying the data.

It’s important to note these backups are binary-level backups, not logical backups (such as mongodump would produce).

Logical vs. Binary Backups

Before the concept of a MongoDB Hot Backup, the only way to backup a MongoDB instance, cluster or replica set was using the logical backup tool ‘mongodump’, or using block-device (binary) snapshots.

A “binary-level” backup means backup data contains the data files (WiredTiger or RocksDB) that MongoDB stores on disk. This is different from the BSON representation of the data that ‘mongodump’ (a “logical” backup tool) outputs.

Binary-level backups are generally faster than logical because a logical backup (mongodump) requires the server to read all data and return it to the MongoDB Client API. Once received, ‘mongodump’ serializes the payload into .bson files on disk. Important areas like indices are not backed up in full, merely the metadata describing the index is backed up. On restore, the entire process is reversed: ‘mongorestore’ must deserialize the data created by ‘mongodump’, send it over the MongoDB Client API to the server, then the server’s storage engine must translate this into data files on disk. Due to only metadata of indices being backed up in this approach, all indices must be recreated at restore time. This is a serial operation for each collection! I have personally restored several databases where the majority of the restore time was the index rebuilds and NOT the actual restore of the raw data!

In contrast, binary-level backups are much simpler: the storage-engine representation of the data is what is backed up. This style of backup includes the real index data, meaning a restore is as simple as copying the backup directory to be in the location of the MongoDB dbPath and restarting MongoDB. No recreation of indices is necessary! For very large datasets, this can be a massive win. Hours or even days of restore time can be saved due to this efficiency.

Of course, there are always some tradeoffs. Binary-level backups can take a bit more space on disk and care must be taken to ensure the files are restored to the right version of MongoDB on matching CPU architecture. Generally, backing up the MongoDB Configuration File and version number with your backups addresses this concern.

‘createBackup’ Command

The Hot Backup feature is triggered by a simple admin command via the ‘mongo’ shell named ‘createBackup’. This command requires only one input: the path to output the backup to, named ‘backupDir’. This backup directory must not exist or an error is returned.

If you have MongoDB Authorization enabled (I hope you do!), this command requires the built-in role: ‘backup’ or a role that inherits the “backup” role.

An example in the  ‘mongo’ shell:

> db.adminCommand({
    createBackup: 1,
    backupDir: "/data/backup27017"
  })
{ "ok" : 1 }

When this command returns an “ok”, a full backup is available to be read at the location specified in ‘backupDir’. This end-result is similar to using block-device snapshots such as LVM snapshots, however with less overhead vs. the 10-30% write degradation many users report at LVM snapshot time.

This backup directory can be deleted with a regular UNIX/Linux “rm -rf ...” command once it is no longer required. A typical deployment archives this directory and/or upload the backup to a remote location (Rsync, NFS, AWS S3, Google Cloud Storage, etc.) before removing the backup directory.

WiredTiger vs. RocksDB Hot Backups

Although the ‘createBackup’ command is syntactically the same for WiredTiger and RocksDB, there are some big differences in the implementation of the backup behind-the-scenes.

RocksDB is a storage engine available in Percona Server for MongoDB that uses a level-tiered compaction strategy that is highly write-optimized. As RocksDB uses immutable (write-once) files on disk, it can provide much more efficient backups of the database by using filesystem “hardlinks” to a single inode on disk. This is important to know for large data sets as this requires exponentially less overhead to create a backup.

If your RocksDB-based server ‘createBackup’ command uses an output path that is on the same disk volume as the MongoDB dbPath (very important requirement), hardlinks are used instead of copying most of the database data! If only 5% of the data changes during backup, only 5% of data is duplicated/copied. This makes backups potentially much faster than WiredTiger, which needs to make a full copy of the data and use two-times as much disk space as a result.

Here is an example of a ‘createBackup’ command on a RocksDB-based mongod instance that uses ‘/data/mongodb27017’ as a dbPath:

$ mongo --port=27017
test1:PRIMARY> db.adminCommand({
    createBackup: 1,
    backupDir: "/data/backup27017.rocksdb"
})
{ "ok" : 1 }
test1:PRIMARY> quit()

Seeing we received { “ok”: 1 }, the backup is ready at our output path. Let’s see:

$ cd /data/backup27017.rocksdb
$ ls -alh
total 4.0K
drwxrwxr-x. 3 tim tim  36 Mar  6 15:25 .
drwxr-xr-x. 9 tim tim 147 Mar  6 15:25 ..
drwxr-xr-x. 2 tim tim 138 Mar  6 15:25 db
-rw-rw-r--. 1 tim tim  77 Mar  6 15:25 storage.bson
$ cd db
$ ls -alh
total 92K
drwxr-xr-x. 2 tim tim  138 Mar  6 15:25 .
drwxrwxr-x. 3 tim tim   36 Mar  6 15:25 ..
-rw-r--r--. 2 tim tim 6.4K Mar  6 15:21 000013.sst
-rw-r--r--. 2 tim tim  18K Mar  6 15:22 000015.sst
-rw-r--r--. 2 tim tim  36K Mar  6 15:25 000017.sst
-rw-r--r--. 2 tim tim  12K Mar  6 15:25 000019.sst
-rw-r--r--. 1 tim tim   16 Mar  6 15:25 CURRENT
-rw-r--r--. 1 tim tim  742 Mar  6 15:25 MANIFEST-000008
-rw-r--r--. 1 tim tim 4.1K Mar  6 15:25 OPTIONS-000005

Inside the RocksDB ‘db’ subdirectory we can see .sst files containing the data are there! As this MongoDB instance stores data on the same disk at ‘/data/mongod27017’, let’s prove that RocksDB created a “hardlink” instead of a full copy of the data.

First, we get the Inode number of an example .sst file using the ‘stat’ command. I chose the RocksDB data file: ‘000013.sst’:

$ stat 000013.sst
  File: ‘000013.sst’
  Size: 6501      	Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 33556899    Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1000/     tim)   Gid: ( 1000/     tim)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2018-03-06 15:21:10.735310581 +0200
Modify: 2018-03-06 15:21:10.738310479 +0200
Change: 2018-03-06 15:25:56.778556981 +0200
 Birth: -

Notice the Inode number for this file is 33556899. After the ‘find’ command can be used to find all files pointing to Inode 33556899 on /data:

$ find /data -inum 33556899
/data/mongod27017/db/000013.sst
/data/backup27017.rocksdb/db/000013.sst

Using the ‘-inum’ (Inode Number) flag of find, here we can see .sst files in both the live MongoDB instance (/data/mongodb27017) and the backup (/data/backup27017.rocksdb) are pointing to the same inode on disk for their ‘000013.sst’ file, meaning this file was NOT duplicated or copied during the hot backup process. Only metadata was written to point to the same inode! Now, imagine this file was 1TB+ and this becomes very impressive!

Restore Time

MongoDB Hot BackupIt bears repeating that restoring a logical, mongodump-based backup is very slow; Indices are rebuilt in serial for each collection and both mongorestore and the server need to spend time translating data from logical to binary representations.

Sadly, it is extremely rare that backup restore times are tested and I’ve seen large users of MongoDB disappointed to find that logical-backup restores will take several hours, an entire day or longer while their Production is on fire.

Thankfully, binary-level backups are very easy to restore: the backup directory needs to be copied to the location of the (stopped) MongoDB instance dbPath and then the instance just needs to be started with the same configuration file and version of MongoDB. No indices are rebuilt and there is no time spent rebuilding the data files from logical representations!

Percona-Labs/mongodb_consistent_backup Support

We have plans for our Percona-Labs/mongodb_consistent_backup tool to support the ‘createBackup’/binary-level method of backups in the future. See more about this tool in this Percona Blog post: https://www.percona.com/blog/2017/05/10/percona-lab-mongodb_consistent_backup-1-0-release-explained.

Currently, this project supports cluster-consistent backups using ‘mongodump’ (logical backups only), which are very time consuming for large systems.

Support for ‘createBackup’ in this tool greatly reduces the overhead and time required for backups of clusters, but it requires some added complexity to support remote filesystems it would use.

Conclusion

As this article outlines, there are a lot of exciting developments that have reduced the impact of taking backups of your systems. More important than the backup efficiency is the time it takes to recover your important data from backup, an area where “hot” binary backups are the clear winner by leaps and bounds.

If MongoDB hot backup and topics like this interest you, please see the document below, we are hiring!

{
  hiring: true,
  role: "Consultant",
  tech: "MongoDB",
  location: "USA",
  moreInfo: "https://www.percona.com/about-percona/careers/mongodb-consultant-usa-based"
}

The post Free, Fast MongoDB Hot Backup with Percona Server for MongoDB appeared first on Percona Database Performance Blog.

关注dbDao.com的新浪微博

扫码加入微信Oracle小密圈,了解Oracle最新技术下载分享资源

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569