Home > database >  AWK print out the mis-matched records from files comparison
AWK print out the mis-matched records from files comparison

Time:11-23

I need your assistance to find the list of unmatched in the Employee.txt from the following examples on AIX 6.x.

Employee.txt

1|Sam|Smith|Seatle
2|Barry|Jones|Seatle
3|Garry|Brown|Houston
4|George|Bla|LA
5|Celine|Wood|Atlanta
6|Jody|Ford|Chicago

Car.txt

100|red|1
110|green|9
120|yellow|2
130|yellow|6
140|red|8
150|white|0

bash-4.3$ awk -F"|" 'NR==FNR { empcar[$1]=$0; next } { if (empcar[$3]) print empcar[$3] "|" $1 "|" $2 > "match.txt"; else print $0 > "no_match.txt" }' Employee.txt Car.txt
110|green|9
140|red|8
150|white|0

match.txt
1|Sam|Smith|Seatle|100|red
2|Barry|Jones|Seatle|120|yellow
6|Jody|Ford|Chicago|130|yellow

no_match.txt
110|green|9
140|red|8
150|white|0

bash-4.3$ awk -F"|" 'NR==FNR { empcar[$1]=$0; next } !($3 in empcar)' employee.txt car.txt produced the same list as in the no_match.txt.

However, I want the no_match.txt to be as follows:

3|Garry|Brown|Houston
4|George|Bla|LA
5|Celine|Wood|Atlanta

In other words, print the row in Employee.txt when does not have employee no. in Car.txt. I couldn’t work out how to reference those unmatched records in the else statement.

I also encountered a lot of unexplained duplicates in the match.txt with my private confidential data that cannot be disclosed.

Many thanks, George

CodePudding user response:

print the row in Employee.txt when does not have employee no. in Car.txt.

You may use this solution:

awk -F"|" '
NR == FNR {
   empcar[$3]
   next
}
{
   print > ($1 in empcar ? "match.txt" : "no_match.txt")
}' Car.txt Employee.txt

cat match.txt

1|Sam|Smith|Seatle
2|Barry|Jones|Seatle
6|Jody|Ford|Chicago

cat no_match.txt

3|Garry|Brown|Houston
4|George|Bla|LA
5|Celine|Wood|Atlanta

Note that we are processing Car.txt as first file and storing all IDs from 3rd field in array empcar. Later while processing Employee.txt we just redirect output to match or no match based on the condition if $1 from later file exists in associative array empcar or not.

  • Related