I have one file (.tsv) that contain variants calling for all the samples. I would like to merge the first three columns into one column:
Example: Original:
file name= variants.tsv > the first three columns that I want to merge are:
lane sampleID Barcode
B31 00-00-NNA-0000 0000
Desired output:
ID
B31_00-00-NNA-0000_0000
what are the recommended methods?
CodePudding user response:
One way, with a perl one-liner:
perl -F'\t' -lane '
if ($. == 1) {
print join("\t", "ID", @F[3..$#F])
} else {
print join("\t", join("_", @F[0,1,2]), @F[3..$#F])
}' variants tsv
Splits each line into an array (@F
) on tabs, and prints out the header and later lines using slices of that array to extract the appropriate elements, which are then joined into delimited strings.
CodePudding user response:
Starting from this
lane sampleID Barcode
B31 00-00-NNA-0000 0000
and using Miller, you can run
mlr --tsv put -S '$ID=$lane."_".$sampleID."_".$Barcode' input.tsv >output.tsv
to have
------ ---------------- --------- -------------------------
| lane | sampleID | Barcode | ID |
------ ---------------- --------- -------------------------
| B31 | 00-00-NNA-0000 | 0000 | B31_00-00-NNA-0000_0000 |
------ ---------------- --------- -------------------------
If you want only the ID field the command is
mlr --tsv put -S '$ID=$lane."_".$sampleID."_".$Barcode' then cut -f ID input.tsv >output.tsv