Home > Back-end >  Compare two text files line by line, finding differences but ignoring numerical values differences
Compare two text files line by line, finding differences but ignoring numerical values differences

Time:05-23

I'm working on a bash script to compare two similar text files line by line and find the eventual differences between each line of the files, i should point the difference and tell in which line the difference is, but i should ignore the numerical values in this comparison.
Example :

Process is running; process found : 12603 process is listening on port 1200
Process is running; process found : 43023 process is listening on port 1200

in the example above, the script shouldn't find any difference since it's just the process id and it changes all the time.
But otherwise i want it to notify me of the differences between the lines.
Example :

Process is running; process found : 12603 process is listening on port 1200
Process is not running; process found : 43023 process is not listening on port 1200

i already have a working script to find the differences, and i've used the following function to find the difference and ignore the numerical values, but it's not working perfectly, Any suggestions ?

    COMPARE_FILES()
{
    awk 'NR==FNR{a[FNR]=$0;next}$0!~a[FNR]{print $0}' $1 $2
}

Where $1 and $2 are the two files to compare.

CodePudding user response:

Would you please try the following:

COMPARE_FILES() {
    awk '
    NR==FNR {a[FNR]=$0; next}
    {
        b=$0; gsub(/[0-9] /,"",b)
        c=a[FNR]; gsub(/[0-9] /,"",c)
        if (b != c) {printf "< %s\n> %s\n", $0, a[FNR]}
    }' "$1" "$2"
}

CodePudding user response:

Any suggestions ?

Jettison digits before making comparison, I would ameloriate your code following way replace

NR==FNR{a[FNR]=$0;next}$0!~a[FNR]{print $0}

using

NR==FNR{a[FNR]=$0;next}gensub(/[[:digit:]]/,"","g",$0)!~gensub(/[[:digit:]]/,"","g",a[FNR]){print $0}

Explanation: I harness gensub string function as it does return new string (gsub alter selected variable value). I replace [:digit:] character using empty string (i.e. delete it) globally.

CodePudding user response:

Using any awk:

compare_files() {
    awk '{key=$0; gsub(/[0-9] (.[0-9] )?/,RS,key)} NR==FNR{a[FNR]=key; next} key!~a[FNR]' "${@}"
}

The above doesn't just remove the digits, it replaces every set of numbers, whether they're integers like 17 or decimals like 17.31, with the contents of RS (a newline by default) to avoid false matches like:

file1: foo 1234 bar
file2: foo bar

If you just remove the digits then those 2 lines incorrectly become identical:

file1: foo bar
file2: foo bar

whereas if you replace digits with a a newline then they correctly remain not identical:

file1: foo 
bar
file2: foo bar
  • Related