Home > Software design >  How can I find in a tsv file a value and 2 other values corresponding to the same row but in differe
How can I find in a tsv file a value and 2 other values corresponding to the same row but in differe

Time:04-16

So I have a tsv file that I opened in Numbers (Mac) and it has this aspect:

sequence_id    sequence    v_call        d_call       j_call    sequence alignment    junction_id
TTAATAATGTT    GATCCT...   IGHV1-18*04   IGHD5/OR15   IGHJ3*01  CAGATTCA              CARVVLIYDAFDVW
CTGATACAACA    AGAACT...   IGHV3-72*01   IGHD6-2*01   IGHJ3*01  CTGTGCAG              CARLSQRSDGVDFW
TACATTAGTTA    GACTTT...   IGHV4-28*01   IGHD1-4*01   IGHJ3*02  GCTGCAGA              CARKALTTDAFDIW
TAGCTAGCAAA    TTTCCT...   IGHV3-49*04   IGHD1-6*01   IGHJ3*02  TGGTGGAG              CTRVPISWGSFDIW
...

There are 16 708 lines in the tsv file.

I'm given the sequence alignment and I have to 'grep' this and also the v_call, d_call and j_call to the corresponding sequence alignment.

I've used grep -n "CTRVPISWGSFDIW" for finding the desired sequence but I do not know how to 'grep' v_call and the other two at the same time.

I don't know if I'm explaining what I want but, as example: Given "CTRVPISWGSFDIW ", I want to output "CTRVPISWGSFDIW" junction_id followed by "IGHV3-4904", "IGHD1-601" and "IGHJ3*02", the correspondant v_call, d_call and j_call for CTRVPISWGSFDIW.

CodePudding user response:

Would you please try an awk solution:

awk 'BEGIN {FS=OFS="\t"} $7=="CTRVPISWGSFDIW" {print $7, $3, $4, $5}' file.tsv

The same command with comments:

awk '
    BEGIN {FS=OFS="\t"}                         # initialize input/output field separators to a TAB character
    $7=="CTRVPISWGSFDIW" {print $7, $3, $4, $5} # if the 7th field equals to the string, print the 7th, 3rd, 4th and 5th fields
' file.tsv                                      # input tsv file

Output:

CTRVPISWGSFDIW  IGHV3-49*04     IGHD1-6*01      IGHJ3*02

[Edit]
If you are oblidged to use grep, please try:

grep "CTRVPISWGSFDIW" file.tsv | awk 'BEGIN {FS=OFS="\t"} {print $7, $3, $4, $5}'

which will produce the same result.

CodePudding user response:

Suggesting to user awk script.

Provide matching pattern as argument inp.

Filter only lines matching RexExp provided in inp variable.

Print the fields 7th 3rd 4th 5th.

   awk '$0 ~ inp {print $7, $3, $4, $5}' inp="CTRVPISWGSFDIW" input.tsv

You can insert into inp a bash variable like this:

   awk '$0 ~ inp {print $7, $3, $4, $5}' inp="$variable" input.tsv
  • Related