Home > Blockchain >  awk: comparing two files containing numbers
awk: comparing two files containing numbers

Time:11-12

I'm using this command to compare two files and print out lines in which $1 is different:

awk -F, 'NR==FNR {exclude[$1];next} !($1 in exclude)' old.list new.list > changes.list

the files I'm working with have been sorted numerically with -n

old.list:

30606,10,57561
30607,100,26540
30611,300,35,5.068
30612,100,211,0.035
30613,200,5479,0.005
30616,100,2,15.118
30618,0,1257,0.009
30620,14,8729,0.021

new.list

30606,10,57561
30607,100,26540
30611,300,35,5.068
30612,100,211,0.035
30613,200,5479,0.005
30615,50,874,00.2
30616,100,2,15.118
30618,0,1257,0.009
30620,14,8729,0.021
30690,10,87,0.021
30800,20,97,1.021

Result

30615,50,874,00.2
30690,10,87,0.021
30800,20,97,1.021

I'm looking for a way to tweak my command and make awk print lines only if $1 from new.list is not only unique but also > $1 from the last line of the old.list

Expected result:

30690,10,87,0.021
30800,20,97,1.021

because 30690 and 30800 ($1) > 30620 ($1 from the last line of old.list) in this case, 30615,50,874,00.2 would not be printed because 30615 is admitedly unique to new.list but it's also < 30620 ($1 from the last line of the old.list)

awk -F, '{if ($1 #from new.list > $1 #from_the_last_line_of_old.list) print }'

something like that, but I'm not sure it can be done this way?

Thank you

CodePudding user response:

You can use the awk you have but then pipe through sort to sort numeric high to low then pipe to head to get the first:

awk -F, 'FNR==NR{seen[$1]; next} !($1 in seen)' old new | sort -nr | head -n1
30690,10,87,0.021

Or, use an the second pass to find the max in awk and an END block to print:

awk -F, 'FNR==NR{seen[$1]; next} 
(!($1 in seen)) {uniq[$1]=$0; max= $1>max ? $1 : max}
END {print uniq[max]}' old new 
30690,10,87,0.021

Cup of coffee and reading you edit, just do this:

awk -F, 'FNR==NR{ref=$1; next} $1>ref' old new
30690,10,87,0.021
30800,20,97,1.021
  1. Since you are only interested in the values greater than the last line of old there is no need to even look at the other lines of that file;

  2. Just read the full first file and grab the last $1 since it is already sorted and then compare to $1 in the new file. If old is not sorted or you just want to save that step, you can do:

    FNR==NR{ref=$1>ref ? $1 : ref; next}

  3. if you need to uniquely the values in new you can do that as part of the sort step you are already doing:

    sort -t, -k 1,1 -n -u new

  • Related