I have a directory with multiple txt files, I need to compare the contents of each txt file and print the output as "Yay all files are same" else " Oops, File are not same"
cd tmp_dir
ls
abc.txt cde.txt fgh.txt ... xyz.txt
[ tmp_dir ]$ cat abc.txt
2022-08-01_20:14:36
[ tmp_dir ]$ cat def.txt
2022-08-01_07:40:29
[ tmp_dir ]$
How to loops through files and compare the contents
for file in tmp_dir/*; do
if [ -f "$file" ]; then
cmp -s -- files # need to compare all the files under directory
fi
done
Expected Output:
#If contents are same [Output should be in Green color]
Yay, all files are same
#If contents are not same [In red color]
Oops, Files are not the same
CodePudding user response:
This can be done in a single pipeline by hashing all of the files and then counting how many unique hashes you get. If the answer is 1, all of the files are the same.
distinct_hashes="$(
find dir/ -type f -exec sha512sum {} | # hash all files in `dir/`
awk '{print $1}' | # strip file names from output
sort -u | # remove duplicate hashes
wc -l # count distinct hashes
)"
case "$distinct_hashes" in
0) echo "no files";;
1) echo "all the same";;
*) echo "not all the same";;
esac
Alternatively, you could use cmp
as you tried, and it would be more efficient. You'll just have to manually loop over all of the files. Note that you don't have to compare all pairs of files, which would be O(n2); you can keep it O(n) by comparing each file to one other.
first_file=
same=1
for file in dir/*; do
[[ -f "$file" ]] || continue
if [[ -z "$first_file" ]]; then
first_file="$file"
elif ! cmp -s "$file" "$first_file"; then
same=0
break
fi
done
if [[ -z "$first_file" ]]; then
echo "no files"
elif ((same)); then
echo "all the same"
else
echo "not all the same"
fi
Advanced shell scripters might point out the quotes in distinct_hashes="$(...)"
, case "$distinct_hashes"
, [[ -f "$file" ]]
, [[ -z "$first_file" ]]
, and first_file="$file"
are unnecessary. I like to include optional quotes. Quoting variable expansions is a really important habit to develop and not everyone will know the intricacies of when they are and aren't required.