Compare two rows and print only one if a pattern repeats between two columns in any order row-CodePudding

This should be fairly simple (hopefully) using awk, but I can't find a solution. I have a file and I want to compare each row to one another if the string combination of column 1 and column 2 repeats in any other row I want to print only the first match:

cat file.csv
alpha_3,alpha_47,100,60,0,0,1,60,1,60,8.21E-29,111
alpha_47,alpha_3,100,60,0,0,1,60,1,60,8.21E-29,111
beta_86,beta_12,100,61,0,0,1,61,1,61,2.33E-29,113
beta_86,beta_14,100,61,0,0,1,61,1,61,2.33E-29,113
beta_12,beta_14,100,61,0,0,1,61,1,61,2.33E-29,113
beta_14,beta_12,100,61,0,0,1,61,1,61,2.33E-29,113


#command
This seems to be working but I have to extract the first two columns,
and I can't print the first instance of the match 

awk -F "," '{print $1 , $2}' file.csv | awk -F' ' '!seen[$2 FS $1]; {seen[$0]  }' 
alpha_3 alpha_47
beta_86 beta_12
beta_86 beta_14
beta_12 beta_14

But it doesn't print the whole line and if I try without selecting the first two columns it doesn't work.

#desired output
alpha_3,alpha_47,100,60,0,0,1,60,1,60,8.21E-29,111
beta_86,beta_12,100,61,0,0,1,61,1,61,2.33E-29,113
beta_86,beta_14,100,61,0,0,1,61,1,61,2.33E-29,113
beta_12,beta_14,100,61,0,0,1,61,1,61,2.33E-29,113

I am learning awk (still) so if someone can provide a solution and explain their code that will be even better!

CodePudding user response：

The general solution when wanting to compare compound values regardless of order is to sort the keys used to create the array index. Given just 2 keys that reduces to just comparing them and always concatenating them in same order (e.g. biggest first) regardless of their input order:

$ awk -F, '!seen[$1>$2 ? $1 FS $2 : $2 FS $1]  ' file.csv
alpha_3,alpha_47,100,60,0,0,1,60,1,60,8.21E-29,111
beta_86,beta_12,100,61,0,0,1,61,1,61,2.33E-29,113
beta_86,beta_14,100,61,0,0,1,61,1,61,2.33E-29,113
beta_12,beta_14,100,61,0,0,1,61,1,61,2.33E-29,113