Compare two files and store differences using conditional-CodePudding

I managed to find half of the solution to my challenge, but I cannot find a way to add a conditional to deal with the other half. I am using awk. The field separator is ; and the values are inside double-quotes ". The files have only 3x fields each.

I have two files (file1.txt file2.txt) and want to store the differences in a third file(results.txt).

file1.txt

"SWITCH1";"rack7";"Datacenter1"
"SWTICH46";"rack1";"rack1"
"ROUTER3";"";"rack1"
"SWITCH7";"rack1";"rack1"
"ROUTER9";"rack1";"rack1"
"ROUTER22";"rack1";"Datacenter4"

file2.txt

"SWITCH1";"rack7";"Datacenter1"
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"

If I use:

awk -F';' 'FNR==NR {a[$0];next} !($0 in a)' file1.txt file2.txt

I get:

"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"

But I want to treat $2 in file2.txt " and $2 in file1.txt rack1 not as a difference between files. Therefore whenever I find an entry on file2.txt that has " in field $2 and rack1 in field $2 in file1.txt for the same $1, I do not want to treat as difference and discard it.

The file is generated dynamically nightly and when it happens; field $2==rack1 in file1.txt while field $2==" in file2.txt. This is the match to exclude as well as the one I managed to exclude with the awk command above. Below is the expected output:

Desired results.txt

"SWITCH51";"rack7";"Datacenter2"

I am struggling to find a conditional to handle this scenario.

CodePudding user response：

You could check if the value of field 2 is just " and replace it with "rack1"

If after the replacement $0 is not in array a then print the unmodified row which is the tmp variable in the example.

awk '
BEGIN{FS=OFS=";"}
FNR==NR {a[$0];next} 
{
  tmp = $0
  sub(/^"$/, "\"rack1\"", $2)
  if (!($0 in a)) print tmp
}
' file1.txt file2.txt

Output

"SWITCH51";"rack7";"Datacenter2"

CodePudding user response：

Based on your shown samples, please try following awk code. Simple explanation would be, in first Input_file's reading creating 2 arrays a and b with index of $0 and $1,$3 respectively. In next Input_file's reading checking 2 conditions if $1,$3 is NOT present in b AND $0 is not present in a then print that line from Input_file2.

awk -F';' '
FNR==NR{
  a[$0]
  b[$1,$3]
  next
}
!(($1,$3) in b) && !($0 in a)
' file1.txt file2.txt