Home > Back-end >  Compare each row to one another on the basis of first three columns and output only one line
Compare each row to one another on the basis of first three columns and output only one line

Time:12-08

I have file like this

chr1 13369510 13369602 PRAMEF18 0

chr1 13369510 13369602 PRAMEF19 0

i want to compare first three columns of every row and if it matches then i want an output like this

chr1 13369510 13369602 PRAMEF18,PRAMEF19 0

CodePudding user response:

This should work:

awk -F'\t' '{
   key=$1$2$3;
   split($0,fields,"\t");
   last_fields[key]=$5"\t"$6;
   lines[key]=lines[key]?lines[key] ", " $4 : $1"\t"$2"\t"$3"\t"$4
} 
END {
   for (line in lines) print lines[line]"\t"last_fields[line]
}' your_file.tsv
  • First use column 1, 2 and 3 as a key.
  • Split the columns and save the last two in a dict for later mapping.
  • Create a dict with all the lines of the files, using col1,2,3 as key. If the key already exists in the dict, append the 4th column (the one you want to merge).
  • Print
  • Related