AWK how to process multiple files and comparing them IN CONTROL FILE! (not command line one-liner)-CodePudding

I read all of answers for similar problems but they are not working for me because my files are not uniformal, they contain several control headers and in such case is safer to create script than one-liner and all the answers focused on one-liners. In theory one-liners commands should be convertible to script but I am struggling to achieve:

printing the control headers
print only the records started with 16 in <file 1> where value of column 2 NOT EXISTS in column 2 of the <file 2>

I end up with this:

BEGIN {
FS="\x01";
OFS="\x01";
RS="\x02\n";
ORS="\x02\n";

file1=ARGV[1];
file2=ARGV[2];
count=0;
}

/^#/ {
    print;
    count  ;
}
# reset counters after control headers
NR=1;
FNR=1;
# Below gives syntax error
/^16/ AND NR==FNR {
    a[$2];next;  'FNR==1 || !$2 in a' file1 file2
    }

END {
}

Googling only gives me results for command line processing and documentation is also silent in that regard. Does it mean it cannot be done?

CodePudding user response：

Perhaps try:

script.awk:

BEGIN {
    OFS = FS = "\x01"
    ORS = RS = "\x02\n"
}

NR==FNR {
    if (/^16/) a[$2]
    next
}

/^16/ && !($2 in a) || /^#/

Note the parentheses: !$2 in a would be parsed as (!$2) in a

Invoke with:

awk -f script.awk FILE2 FILE1

Note order of FILE1 / FILE2 is reversed; FILE2 must be read first to pre-populate the lookup table.

CodePudding user response：

First of all, short answer to my question should be "NOT POSSIBLE", if anyone read question carefully and knew AWK in full that is obvious answer, I wish I knew it sooner instead of wasting few days trying to write script. Also, there is no such thing as minimal reproducible example (this was always constant pain on TeX groups) - I need full example working, if it works on 1 row there is no guarantee if it works on 2 rows and my number of rows is ~ 127 mln.

If you read code carefully than you would know what is not working - I put in comment section what is giving syntax error. Anyway, as @Daweo suggested there is no way to use logic operator in pattern section. So because we don't need printing in first file the whole trick is to do conditional in second brackets:

awk -F, 'BEGIN{} NR==FNR{a[$1];next} !($1 in a) { if (/^16/) print $0} ' set1.txt set2.txt

assuming in above example that separator is comma. I don't know where assumption about multiple RS support only in gnu awk came from. On MacOS BSD awk it works exactly the same, but in fact RS="\x02\n" is single separator not two separators.