Here is content of MongoDB
/data/db
folder content. Where one of the collections is 723GB. And other collections just few KB.
-rw------- 1 lxd docker 723G Dec 5 10:15 collection-0-1080408413244540209.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 collection-0-3112968025499504303.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 collection-2-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 collection-4-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:15 collection-7-1080408413244540209.wt
-rw------- 1 lxd docker 0 Dec 5 10:14 .dbshell
drwx------ 2 lxd docker 90 Dec 5 10:15 diagnostic.data
-rw------- 1 lxd docker 8.1G Dec 5 10:15 index-1-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:14 index-1-3112968025499504303.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 index-3-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 index-5-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 index-6-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:14 index-8-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:15 index-9-1080408413244540209.wt
drwx------ 2 lxd docker 110 Dec 5 10:15 journal
-rw------- 1 lxd docker 36K Dec 5 10:15 _mdb_catalog.wt
-rw------- 1 lxd docker 0 Dec 5 10:15 mongod.lock
-rw------- 1 lxd docker 36K Dec 5 10:15 sizeStorer.wt
-rw------- 1 lxd docker 114 Dec 5 10:14 storage.bson
-rw------- 1 lxd docker 50 Dec 5 10:14 WiredTiger
-rw------- 1 lxd docker 4.0K Dec 5 10:15 WiredTigerHS.wt
-rw------- 1 lxd docker 21 Dec 5 10:14 WiredTiger.lock
-rw------- 1 lxd docker 1.5K Dec 5 10:15 WiredTiger.turtle
-rw------- 1 lxd docker 84K Dec 5 10:15 WiredTiger.wt
- I simply backup whole docker volume to another server via rsync.
- I never change content of the 723GB collection, but for some reason MongoDB update the file approximately once in a week.
- Because of that rsync also update that file remotely. And because I'm using snapshots, every week new snapshot add another 723GB to the storage, that is unacceptable and cause me the problems.
To resolve that problem, I simply added 723GB collection into rsync exception and do not upload it anymore. Is that fine? May I after 1 year still use my backup to restore the server if I do not update collection-0-1080408413244540209.wt
file any more?
CodePudding user response:
By default, rsync only copies new or changed files from a source to destination so you dont need to add the file to exception list if you copy the mounted snapshot files. From the other hand the wiredTiger is a bit sensitive and generate checksum in the data root folder based on checkpoints from all collections so in case the file differ there is a big chance the mongod process to not be able to start from the restored snapshot. So I would suggest to not exclude the file completely but leave to rsync to check every time if file is same or differ and need to be copied again or not.
P.S. Note that the scenario you have described was valid with the deprecated previous mongoDB storage engine mmapv1 where it was even possible to copy the collection inside running instance on the fly , unfortunately the wiredTiger do not allow it , but offer other advantages.