I have generated 2 .csv files, one containing the original md5sums of some files in a directory and one containing the md5sums calculated at a specific moment.
md5_original.csv
----------
$1 $2 $3
7815696ecbf1c96e6894b779456d330e,,s1.txt
912ec803b2ce49e4a541068d495ab570,,s2.txt
040b7cf4a55014e185813e0644502ea9,,s64.txt
8a0b67188083b924d48ea72cb187b168,,b43.txt
etc.
md5_$current_date.csv
----------
$1 $2 $3
7815696ecbf1c96e6894b779456d330e,,s1.txt
4d4046cae9e9bf9218fa653e51cadb08,,s2.txt
3ff22b3585a0d3759f9195b310635c29,,b43.txt
etc.
* some files could be deleted when calculating current md5sums
I am looking to iterate over the values of column $3 in md5_$current_date.csv
and, for each value of that column, to check if it exists in the md5_original.csv
and if so, finally to compare its value on $1.
Output should be:
s2.txt hash changed from 912ec803b2ce49e4a541068d495ab570 to 4d4046cae9e9bf9218fa653e51cadb08.
b43.txt hash changed from 8a0b67188083b924d48ea72cb187b168 to 3ff22b3585a0d3759f9195b310635c29.
I have written the script for building this two .csv files, but I am struggling to the awk part where I have to do what I have asked above. I don't know if there is a better way to do this, I am a newbie.
CodePudding user response:
I would use GNU AWK
for this task following way, let md5_original.csv
content be
7815696ecbf1c96e6894b779456d330e {BLANK_COLUMN} s1.txt
912ec803b2ce49e4a541068d495ab570 {BLANK_COLUMN} s2.txt
040b7cf4a55014e185813e0644502ea9 {BLANK_COLUMN} s64.txt
8a0b67188083b924d48ea72cb187b168 {BLANK_COLUMN} b43.txt
and md5_current.csv
content be
7815696ecbf1c96e6894b779456d330e {BLANK_COLUMN} s1.txt
4d4046cae9e9bf9218fa653e51cadb08 {BLANK_COLUMN} s2.txt
3ff22b3585a0d3759f9195b310635c29 {BLANK_COLUMN} b43.txt
then
awk 'FNR==NR{arr[$3]=$1;next}($3 in arr)&&($1 != arr[$3]){print $3 " hash changed from " arr[$3] " to " $1}' md5_original.csv md5_current.csv
output
s2.txt hash changed from 912ec803b2ce49e4a541068d495ab570 to 4d4046cae9e9bf9218fa653e51cadb08
b43.txt hash changed from 8a0b67188083b924d48ea72cb187b168 to 3ff22b3585a0d3759f9195b310635c29
Explanation: FNR is number of row in current file, NR is number of row globally, these are equal only when processing 1st file. When processing 1st file I create array arr
so keys are filenames and values are corresponding hash values, next
cause GNU AWK
to go to next line i.e. no other action is undertaken, so rest is applied only for all but first file. ($3 in arr)
is condition: is current $3
one of keys of arr
? If it does hold true I print
concatenation of current $3
(that is filename) hash changed from
string value for key $3
from array arr
(that is old hash value) to
string $1
(current hash value). If given filename is not present in array arr
then no action is undertaken.
Edit: added exclusion for hash which not changed as suggested in comment.
(tested in gawk 4.2.1)