I want to process a CSV input file like the following :
a;b
b;c
b;a
c;d
x;y
d;c
and remove both duplicate lines defined by the rule : a;b
and b;a
are considered duplicate and therefore should be removed, the same rule applies to c;d
and d;c
they shoud be removed.
I tried to process file twice and use the condition NR==FNR
to figure which pass it is (first or second) but i can't figure out how to implement the test on the duplication rule i defined above.
please help me
CodePudding user response:
Would you please try the following:
awk -F';' '
NR==FNR { # 1st pass
if (seen[$1$2] || seen[$2$1] ) { # if "ab" or "ba" already exists
dupe[$1";"$2] ; dupe[$2";"$1] # then mark "a;b" and "b;a" as duplicates
}
next
}
! dupe[$0] # print unless duplicates
' file file
Output:
b;c
x;y
CodePudding user response:
$ awk -F';' '{ks[$0]; a[$2 FS $1] } END{for(k in ks) if(!a[k]) print k}' file
x;y
b;c