Home > Enterprise >  AWK Script: Matching Two File with Unique Identifier and append record if already match
AWK Script: Matching Two File with Unique Identifier and append record if already match

Time:12-10

I'm trying to comparing two files with field as unique identifier to match. With file 1 having account number and compare with second file. If account number both file, next is condition to match the value and append to the original file or record.

Sample file 1:

ACCT1,PHONE1,TEST1
ACCT2,PHONE2,TEST3

Sample file 2:

ACCT1,SOMETHING1
ACCT1,SOMETHING3
ACCT1,SOMETHING1
ACCT1,SOMETHING3
ACCT2,SOMETHING1
ACCT2,SOMETHING3
ACCT2,SOMETHING1
ACCT2,SOMETHING1

But since the awk always gets the last occurrences of the file even there is already match before the end of record.

Actual Output base with condition below:

ACCT1,PHONE1,TEST1,000
ACCT2,PHONE2,TEST3,001

Expected Output:

ACCT1,PHONE1,TEST1,001
ACCT2,PHONE2,TEST3,001

Code I'm trying to:

awk -f test.awk pass=0 samplefile2.txt pass=1 samplefile1.txt > output.txt

BEGIN{
}
pass==0{
   FS=","
   ACT=$1
   RES1[ACT]=$2
}
pass==1{
   ACCTNO=$1
   PHNO=$2
   FIELD3=$3
   LVCODE=RES1[ACCTNO]
   if(LVCODE=="SOMETHING1"){ OTHERFLAG="001" }
   else if(LVCODE=="SOMETHING4"){ OTHERFLAG="002" }
   else{ OTHERFLAG="000" }

   printf("%s\,", ACCTNO)
   printf("%s\,", PHNO)
   printf("%s\,", FIELD3)
   printf("%s", OTHERFLAG)
   printf "\n"
}

I'm trying to loop the variable that holds array, unfortunately it turns to infinite loop during my run.

CodePudding user response:

You may use this awk command:

awk '
BEGIN {FS=OFS=","}
NR==FNR {
   map[$1] = $0
   next
}
$1 in map {
   print map[$1], ($2 == "SOMETHING1" ? "001" : ($2 == "SOMETHING4" ? "002" : "000"))
   delete map[$1]
}' file1 file2

ACCT1,PHONE1,TEST1,001
ACCT2,PHONE2,TEST3,001

Once we print a matching record from file2 we delete record from associative array map to ensure only first matching record is evaluated.

CodePudding user response:

It sounds like you want to know the first occurrence of ACCTx in samplefile2.txt if SOMETHING1 or SOMETHING4 is present. I think you should read samplefile1.txt first into a data struction and then iterate line by line in samplefile2.txt looking for your criteria

BEGIN {
    FS=","
    while (getline < ACCOUNTFILE ) accounts[$1]=$0
}
{ OTHERFLAG = "" }
$2 == "SOMETHING1" { OTHERFLAG="001" }
$2 == "SOMETHING4" { OTHERFLAG="002" }
($1 in accounts) && OTHERFLAG!="" {
    print(accounts[$1] "," OTHERFLAG)
    # delete the accounts so that it does not print again.
    # Only the first occurrence in samplefile2.txt will matter.
    delete accounts[$1]
}
END {
    # Print remaining accounts that did not match above
    for (acct in accounts) print(accounts[acct] ",000")
}

Run above with:

awk -v ACCOUNTFILE=samplefile1.txt -f test.awk samplefile2.txt

I am not sure what you want to do if both SOMETHING1 and SOMETHING4 are in samplefile2.txt for the same ACCT1. If you want 'precedence' so that SOMETHING4 will overrule SOMETHING1 if it comes after you will need additional logic. In that case you probably want to avoid the 'delete' and keep updating the accounts[$1] array until you reach the end of the file and then print all the accounts at the end.

  • Related