Home > Enterprise >  Rsync Incremental Backup still copies all the files
Rsync Incremental Backup still copies all the files

Time:11-22

I am currently writing a bash script for rsync. I am pretty sure I am doing something wrong. But I can't tell what it is. I will try to elaborate everything in detail so hopefully someone can help me.

The goal of script is to do full backups and incremental ones using rsync. Everything seems to work perfectly well, besides one crucial thing. It seems like even though using the --link-dest parameter, it still copies all the files. I have checked the file sizes with du -chs.

First here is my script:

#!/bin/sh
while getopts m:p: flags
do
  case "$flags" in
    m) mode=${OPTARG};;
    p) prev=${OPTARG};;
    *) echo "usage: $0 [-m] [-p]" >&2
       exit 1 ;;
  esac
done

date="$(date ' %Y-%m-%d')";


#Create Folders If They Do Not Exist (-p paramter)
mkdir -p /Backups/Full && mkdir -p /Backups/Inc

FullBackup() {
  #Backup Content Of Website
  mkdir -p /Backups/Full/$date/Web/html
  rsync -av user@IP:/var/www/html/ /Backups/Full/$date/Web/html/

  #Backup All Config Files NEEDED. Saving Storage Is Key ;)
  mkdir -p /Backups/Full/$date/Web/etc
  rsync -av user@IP:/etc/apache2/ /Backups/Full/$date/Web/etc/

  #Backup Fileserver
  mkdir -p /Backups/Full/$date/Fileserver
  rsync -av user@IP:/srv/samba/private/ /Backups/Full/$date/Fileserver/

  #Backup MongoDB
  ssh user@IP /usr/bin/mongodump --out /home/DB
  rsync -av root@BackupServerIP:/home/DB/ /Backups/Full/$date/DB
  ssh user@IP rm -rf /home/DB
}

IncrementalBackup(){
  Method="";
  if [ "$prev" == "full" ]
  then
    Method="Full";
  elif [ "$prev" == "inc" ]
  then
    Method="Inc";
  fi

  if [ -z "$prev" ]
  then
  echo "-p Parameter Empty";
  else
  #Get Latest Folder - Ignore the hacky method, it works.
  cd /Backups/$Method
  NewestBackup=$(find . ! -path . -type d | sort -nr | head -1 | sed s@^./@@)
  IFS='/'
  read -a strarr <<< "$NewestBackup"
  Latest_Backup="${strarr[0]}";
  cd /Backups/

  #Incremental-Backup Content Of Website
  mkdir -p /Backups/Inc/$date/Web/html
  rsync -av --link-dest /Backups/$Method/"$Latest_Backup"/Web/html/ user@IP:/var/www/html/ /Backups/Inc/$date/Web/html/

  #Incremental-Backup All Config Files NEEDED
  mkdir -p /Backups/Inc/$date/Web/etc
  rsync -av --link-dest /Backups/$Method/"$Latest_Backup"/Web/etc/ user@IP:/etc/apache2/ /Backups/Inc/$date/Web/etc/

  #Incremental-Backup Fileserver
  mkdir -p /Backups/Inc/$date/Fileserver
  rsync -av --link-dest /Backups/$Method/"$Latest_Backup"/Fileserver/ user@IP:/srv/samba/private/ /Backups/Inc/$date/Fileserver/

  #Backup MongoDB
  ssh user@IP /usr/bin/mongodump --out /home/DB
  rsync -av root@BackupServerIP:/home/DB/ /Backups/Full/$date/DB
  ssh user@IP rm -rf /home/DB
  fi
}

if [ "$mode" == "full" ]
then
  FullBackup;
elif [ "$mode" == "inc" ]
then
  IncrementalBackup;
fi

The command i used: Full-Backup bash script.sh -m full

Incremental bash script.sh -m inc -p full

Executing the script is not giving any errors at all. As I mentioned above, it just seems like it's still copying all the files. Here are some tests I did.

Output of du -chs

root@Backup:/Backups# du -chs /Backups/Full/2021-11-20/*
36K     /Backups/Full/2021-11-20/DB
6.5M    /Backups/Full/2021-11-20/Fileserver
696K    /Backups/Full/2021-11-20/Web
7.2M    total
root@Backup:/Backups# du -chs /Backups/Inc/2021-11-20/*
36K     /Backups/Inc/2021-11-20/DB
6.5M    /Backups/Inc/2021-11-20/Fileserver
696K    /Backups/Inc/2021-11-20/Web
7.2M    total

Output of ls -li

root@Backup:/Backups# ls -li /Backups/Full/2021-11-20/
total 12
1290476 drwxr-xr-x 4 root root 4096 Nov 20 19:26 DB
1290445 drwxrwxr-x 6 root root 4096 Nov 20 18:54 Fileserver
1290246 drwxr-xr-x 4 root root 4096 Nov 20 19:26 Web
root@Backup:/Backups# ls -li /Backups/Inc/2021-11-20/
total 12
1290506 drwxr-xr-x 4 root root 4096 Nov 20 19:28 DB
1290496 drwxrwxr-x 6 root root 4096 Nov 20 18:54 Fileserver
1290486 drwxr-xr-x 4 root root 4096 Nov 20 19:28 Web

Rsync Output when doing the incremental backup and changing/adding a file

receiving incremental file list
./
lol.html

sent 53 bytes  received 194 bytes  164.67 bytes/sec
total size is 606  speedup is 2.45
receiving incremental file list
./

sent 33 bytes  received 5,468 bytes  11,002.00 bytes/sec
total size is 93,851  speedup is 17.06
receiving incremental file list
./

sent 36 bytes  received 1,105 bytes  760.67 bytes/sec
total size is 6,688,227  speedup is 5,861.72
*Irrelevant MongoDB Dump Text*

sent 146 bytes  received 2,671 bytes  1,878.00 bytes/sec
total size is 2,163  speedup is 0.77

I suspect that the ./ has something to do with that. I might be wrong, but it looks suspicious. Though when executing the same command again, the ./ are not in the log, probably because I did it on the same day, so it was overwriting in the /Backup/Inc/2021-11-20 Folder.

Let me know for more information. I have been trying around for a long time now. Maybe I am simply wrong and there are links made and disk space economized.

CodePudding user response:

I didn't read the entire code because the main problem didn't seem to lay there.
Verify the disk usage of your /Backups directory with du -sh /Backups and then compare it with the sum of du -sh /Backups/Full and du -sh /Backups/Inc.

I'll show you why with a little test:

Create a directory containing a file of 1 MiB:

mkdir -p /tmp/example/data

dd if=/dev/zero of=/tmp/example/data/zerofile bs=1M count=1

Do a "full" backup:

rsync -av /tmp/example/data/ /tmp/example/full

Do an "incremental" backup

rsync -av --link-dest=/tmp/example/full /tmp/example/data/ /tmp/example/incr

Now let's see what we got:

with ls -l

ls -l /tmp/example/*
-rw-rw-r-- 1 user group 1048576 Nov 21 00:24 /tmp/example/data/zerofile
-rw-rw-r-- 2 user group 1048576 Nov 21 00:24 /tmp/example/full/zerofile
-rw-rw-r-- 2 user group 1048576 Nov 21 00:24 /tmp/example/incr/zerofile

and with du -sh

du -sh /tmp/example/*
1.0M    /tmp/example/data
1.0M    /tmp/example/full
0   /tmp/example/incr
  • Oh? There was a 1 MiB file in /tmp/example/incr but du missed it ?

Actually no. As the file wasn't modified since the previous backup (referenced with --link-dest), rsync created a hard-link to it instead of copying its content. — Hard-links connect a same memory space to different files
And du can detect hard-links and show you the real disk usage, but only when the hard-linked files are included (even in sub-dirs) in its arguments. For example, if you use du -sh independently for /tmp/example/incr:

du -sh /tmp/example/incr
1.0M    /tmp/example/incr
  • How do you detect that there is hard-links to a file ?

ls -l actually showed it to us:

-rw-rw-r-- 2 user group 1048576 Nov 21 00:24 /tmp/example/full/zerofile
           ^
          HERE

This number means that there are two existing hard-links to the file: this file itself and another one in the same filesystem.


about your code

It doesn't change anything but I would replace:

  #Get Latest Folder - Ignore the hacky method, it works.
  cd /Backups/$Method
  NewestBackup=$(find . ! -path . -type d | sort -nr | head -1 | sed s@^./@@)
  IFS='/'
  read -a strarr <<< "$NewestBackup"
  Latest_Backup="${strarr[0]}";
  cd /Backups/

with:

  #Get Latest Folder
  glob='20[0-9][0-9]-[0-1][0-9]-[0-3][0-9]' # match a timestamp (more or less)
  NewestBackup=$(compgen -G "/Backups/$Method/$glob/" | sort -nr | head -n 1)
  • glob makes sure that the directories/files found by compgen -G will have the right format.
  • Adding / at the end of a glob makes sure that it matches directories only.
  • Related