Home > Enterprise >  How do I compare if list of files from one directory exist in second directory and copy them to thir
How do I compare if list of files from one directory exist in second directory and copy them to thir

Time:06-01

I have a list of 100 files in directory1 with .dll extension:

directory1/file1.dll
directory1/file2.dll
...
directory1/file100.dll

There is another, directory2, with 10000 files with .dll extension, located among many subdirectories:

directory2/subdirectory1/file1.dll
directory2/subdirectory2/file3.dll
...
directory2/subdirectory3/file10000.dll

I need to compare if 100 files of same name from directory1 exist in directory2 and then copy found ones to directory3.

How can I do it in most efficient way? Thank you in advance.

CodePudding user response:

Try this Shellcheck-clean code:

#! /bin/bash -p

dir1=directory1
dir2=directory2
dir3=directory3

shopt -s dotglob globstar nullglob

# Set up an associative array to record which files are in "$dir1"
declare -A is_in_dir1
for path in "$dir1"/*.dll; do
    file=${path##*/}
    is_in_dir1[$file]=1
done

for path in "$dir2"/**/*.dll; do
    file=${path##*/}
    if (( ${is_in_dir1[$file]-0} )); then
        cp -n -v -- "$path" "$dir3"
    fi
done
  • You'll need to change the dir1, dir2, and dir3 settings.
  • shopt -s sets some Bash configurations required by the code:
    • dotglob causes glob patterns (e.g. *.dll) to match names that begin with .. You might not want that.
    • globstar enables the use of ** to match paths recursively through directory trees.
    • nullglob makes globs expand to nothing when nothing matches (otherwise they expand to the glob pattern itself, which is almost never useful in programs).
  • See BashGuide/Arrays - Greg's Wiki (last section) for information about associative arrays in Bash.
  • See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${path##*/}.
  • See glob - Greg's Wiki for information about globbing in general, and globstar and the ** pattern in particular.
  • The code should work with any file or directory names (including ones with spaces or newlines in them).
  • I can't say that the code implements the "most efficient way", but it scans both "$dir1" and "$dir2" only once and I can't think of a way to make it significantly faster.

CodePudding user response:

Assuming you have no spaces (or other unpleasant chars in filenames):

cd /path/to/first_directory
for file in *.dll; do
    find /path/to/second_directory/ -name "$file" | xargs -I ABCD cp ABCD /path/to/third_dir
done

find goes through subdirectories. ABCD is placeholder. You can see details in man xargs:

-I replace-str

Replace occurrences of replace-str in the initial-arguments with names read from standard input. Also, unquoted blanks do not terminate input items; instead the separator is the newline character. Implies -x and -L 1.

  • Related