Merge three columns in one (linux, python, or perl)-CodePudding

I have one file (.tsv) that contain variants calling for all the samples. I would like to merge the first three columns into one column:

Example: Original:

file name= variants.tsv > the first three columns that I want to merge are:

lane sampleID Barcode

B31 00-00-NNA-0000 0000

Desired output:

B31_00-00-NNA-0000_0000

what are the recommended methods?

CodePudding user response：

One way, with a perl one-liner:

perl -F'\t' -lane '
    if ($. == 1) {
        print join("\t", "ID", @F[3..$#F])
    } else {
        print join("\t", join("_", @F[0,1,2]), @F[3..$#F])
    }' variants tsv

Splits each line into an array (@F) on tabs, and prints out the header and later lines using slices of that array to extract the appropriate elements, which are then joined into delimited strings.

CodePudding user response：

Starting from this

lane    sampleID    Barcode
B31 00-00-NNA-0000  0000

and using Miller, you can run

mlr --tsv put -S '$ID=$lane."_".$sampleID."_".$Barcode' input.tsv >output.tsv

to have

 ------ ---------------- --------- ------------------------- 
| lane | sampleID       | Barcode | ID                      |
 ------ ---------------- --------- ------------------------- 
| B31  | 00-00-NNA-0000 | 0000    | B31_00-00-NNA-0000_0000 |
 ------ ---------------- --------- -------------------------

If you want only the ID field the command is

mlr --tsv put -S '$ID=$lane."_".$sampleID."_".$Barcode' then cut -f ID input.tsv >output.tsv