I'm trying to comparing two files with field as unique identifier to match. With file 1 having account number and compare with second file. If account number both file, next is condition to match the value and append to the original file or record.
Sample file 1:
ACCT1,PHONE1,TEST1
ACCT2,PHONE2,TEST3
Sample file 2:
ACCT1,SOMETHING1
ACCT1,SOMETHING3
ACCT1,SOMETHING1
ACCT1,SOMETHING3
ACCT2,SOMETHING1
ACCT2,SOMETHING3
ACCT2,SOMETHING1
ACCT2,SOMETHING1
But since the awk always gets the last occurrences of the file even there is already match before the end of record.
Actual Output base with condition below:
ACCT1,PHONE1,TEST1,000
ACCT2,PHONE2,TEST3,001
Expected Output:
ACCT1,PHONE1,TEST1,001
ACCT2,PHONE2,TEST3,001
Code I'm trying to:
awk -f test.awk pass=0 samplefile2.txt pass=1 samplefile1.txt > output.txt
BEGIN{
}
pass==0{
FS=","
ACT=$1
RES1[ACT]=$2
}
pass==1{
ACCTNO=$1
PHNO=$2
FIELD3=$3
LVCODE=RES1[ACCTNO]
if(LVCODE=="SOMETHING1"){ OTHERFLAG="001" }
else if(LVCODE=="SOMETHING4"){ OTHERFLAG="002" }
else{ OTHERFLAG="000" }
printf("%s\,", ACCTNO)
printf("%s\,", PHNO)
printf("%s\,", FIELD3)
printf("%s", OTHERFLAG)
printf "\n"
}
I'm trying to loop the variable that holds array, unfortunately it turns to infinite loop during my run.
CodePudding user response:
You may use this awk
command:
awk '
BEGIN {FS=OFS=","}
NR==FNR {
map[$1] = $0
next
}
$1 in map {
print map[$1], ($2 == "SOMETHING1" ? "001" : ($2 == "SOMETHING4" ? "002" : "000"))
delete map[$1]
}' file1 file2
ACCT1,PHONE1,TEST1,001
ACCT2,PHONE2,TEST3,001
Once we print a matching record from file2
we delete record from associative array map
to ensure only first matching record is evaluated.
CodePudding user response:
It sounds like you want to know the first occurrence of ACCTx in samplefile2.txt if SOMETHING1 or SOMETHING4 is present. I think you should read samplefile1.txt first into a data struction and then iterate line by line in samplefile2.txt looking for your criteria
BEGIN {
FS=","
while (getline < ACCOUNTFILE ) accounts[$1]=$0
}
{ OTHERFLAG = "" }
$2 == "SOMETHING1" { OTHERFLAG="001" }
$2 == "SOMETHING4" { OTHERFLAG="002" }
($1 in accounts) && OTHERFLAG!="" {
print(accounts[$1] "," OTHERFLAG)
# delete the accounts so that it does not print again.
# Only the first occurrence in samplefile2.txt will matter.
delete accounts[$1]
}
END {
# Print remaining accounts that did not match above
for (acct in accounts) print(accounts[acct] ",000")
}
Run above with:
awk -v ACCOUNTFILE=samplefile1.txt -f test.awk samplefile2.txt
I am not sure what you want to do if both SOMETHING1 and SOMETHING4 are in samplefile2.txt for the same ACCT1. If you want 'precedence' so that SOMETHING4 will overrule SOMETHING1 if it comes after you will need additional logic. In that case you probably want to avoid the 'delete' and keep updating the accounts[$1] array until you reach the end of the file and then print all the accounts at the end.