Home > Net >  Check for file duplication and remove
Check for file duplication and remove

Time:10-01

Basically what I'm looking to do is remove duplicated files and directories which exist among 2 particular locations. What I would like to do is create a script which will check the contents of directories "A" and "B" and in any cases where a directory that exists in "B" is also present in "A", remove from "B".

EXAMPLE:

/some/path/a

dir1
    file1.ext
    file2.ext
    file3.ext
dir2
    file1.ext
    file2.ext
    file3.ext

/some/path/b

dir1
    file1.ext
    file2.ext
    file3.ext
dir3
    file1.ext
    file2.ext
    file3.ext

In this example, the desired outcome would be to recognize that "dir1" exists in both places and then remove "dir1" and its contents from /some/path/b leaving everything else alone. I have played around in the terminal trying to achieve these results and looked online for answers but haven't found anything that fits this particular use case. Any help would be much appreciated.

CodePudding user response:

Using bash and comm from GNU coreutils:

a=some/path/a
b=some/path/b

mapfile -d '' dirs_to_del <                           \
  <(comm -z12                                         \
    <(shopt -s nullglob; cd "$a" && printf '%s\0' */) \
    <(shopt -s nullglob; cd "$b" && printf '%s\0' */))
cd "$b" && rm -rf -- "${dirs_to_del[@]}"

Drop the echo if output looks ok.

CodePudding user response:

You're looking for something like this:

A=/some/path/a
B=/some/path/b

shopt -s nullglob

for dir in "$B"/*/; do
  if test -d "$A${dir#"$B"}"; then
    echo rm -r -- "$dir"
  fi
done

Remove echo if you're happy with the output.

CodePudding user response:

I use something like this

diff -r -s a b | grep -v "Only in" | awk '{print $4}' | xargs rm

Given the folder structure like this

$ tree
.
├── a
│   ├── dir1
│   │   ├── file1.txt
│   │   ├── file2.txt
│   │   └── file3.txt
│   └── dir2
│       ├── file1.txt
│       ├── file2.txt
│       └── file3.txt
└── b
    ├── dir1
    │   ├── file1.txt
    │   └── file2.txt
    └── dir3
        ├── file1.txt
        ├── file2.txt
        └── file3.txt

diff -r -s a b should show

Files a/dir1/file1.txt and b/dir1/file1.txt are identical
Files a/dir1/file2.txt and b/dir1/file2.txt are identical
Only in a/dir1: file3.txt
Only in a: dir2
Only in b: dir3

The -s is explained in diff as

       -s, --report-identical-files
              report when two files are the same
  • Related