Home > Back-end >  Compare the contents of all files inside a directory
Compare the contents of all files inside a directory

Time:08-02

I have a directory with multiple txt files, I need to compare the contents of each txt file and print the output as "Yay all files are same" else " Oops, File are not same"

cd tmp_dir
ls
abc.txt cde.txt fgh.txt ... xyz.txt

[ tmp_dir ]$ cat abc.txt
2022-08-01_20:14:36
[ tmp_dir ]$ cat def.txt
2022-08-01_07:40:29
[ tmp_dir ]$

How to loops through files and compare the contents

    for file in tmp_dir/*; do
    if [ -f "$file" ]; then
        cmp -s -- files # need to compare all the files under directory
    fi
    done

Expected Output:

#If contents are same [Output should be in Green color]
Yay, all files are same

#If contents are not same [In red color]
Oops, Files are not the same

CodePudding user response:

This can be done in a single pipeline by hashing all of the files and then counting how many unique hashes you get. If the answer is 1, all of the files are the same.

distinct_hashes="$(
    find dir/ -type f -exec sha512sum {}   |    # hash all files in `dir/`
        awk '{print $1}' |                      # strip file names from output
        sort -u |                               # remove duplicate hashes
        wc -l                                   # count distinct hashes
)"

case "$distinct_hashes" in
    0) echo "no files";;
    1) echo "all the same";;
    *) echo "not all the same";;
esac

Alternatively, you could use cmp as you tried, and it would be more efficient. You'll just have to manually loop over all of the files. Note that you don't have to compare all pairs of files, which would be O(n2); you can keep it O(n) by comparing each file to one other.

first_file=
same=1

for file in dir/*; do
    [[ -f "$file" ]] || continue

    if [[ -z "$first_file" ]]; then
        first_file="$file"
    elif ! cmp -s "$file" "$first_file"; then
        same=0
        break
    fi
done

if [[ -z "$first_file" ]]; then
    echo "no files"
elif ((same)); then
    echo "all the same"
else
    echo "not all the same"
fi

Advanced shell scripters might point out the quotes in distinct_hashes="$(...)", case "$distinct_hashes", [[ -f "$file" ]], [[ -z "$first_file" ]], and first_file="$file" are unnecessary. I like to include optional quotes. Quoting variable expansions is a really important habit to develop and not everyone will know the intricacies of when they are and aren't required.

  • Related