So I have a tsv file that I opened in Numbers (Mac) and it has this aspect:
sequence_id sequence v_call d_call j_call sequence alignment junction_id
TTAATAATGTT GATCCT... IGHV1-18*04 IGHD5/OR15 IGHJ3*01 CAGATTCA CARVVLIYDAFDVW
CTGATACAACA AGAACT... IGHV3-72*01 IGHD6-2*01 IGHJ3*01 CTGTGCAG CARLSQRSDGVDFW
TACATTAGTTA GACTTT... IGHV4-28*01 IGHD1-4*01 IGHJ3*02 GCTGCAGA CARKALTTDAFDIW
TAGCTAGCAAA TTTCCT... IGHV3-49*04 IGHD1-6*01 IGHJ3*02 TGGTGGAG CTRVPISWGSFDIW
...
There are 16 708 lines in the tsv file.
I'm given the sequence alignment and I have to 'grep' this and also the v_call, d_call and j_call to the corresponding sequence alignment.
I've used grep -n "CTRVPISWGSFDIW"
for finding the desired sequence but I do not know how to 'grep' v_call and the other two at the same time.
I don't know if I'm explaining what I want but, as example: Given "CTRVPISWGSFDIW ", I want to output "CTRVPISWGSFDIW" junction_id followed by "IGHV3-4904", "IGHD1-601" and "IGHJ3*02", the correspondant v_call, d_call and j_call for CTRVPISWGSFDIW.
CodePudding user response:
Would you please try an awk
solution:
awk 'BEGIN {FS=OFS="\t"} $7=="CTRVPISWGSFDIW" {print $7, $3, $4, $5}' file.tsv
The same command with comments:
awk '
BEGIN {FS=OFS="\t"} # initialize input/output field separators to a TAB character
$7=="CTRVPISWGSFDIW" {print $7, $3, $4, $5} # if the 7th field equals to the string, print the 7th, 3rd, 4th and 5th fields
' file.tsv # input tsv file
Output:
CTRVPISWGSFDIW IGHV3-49*04 IGHD1-6*01 IGHJ3*02
[Edit]
If you are oblidged to use grep
, please try:
grep "CTRVPISWGSFDIW" file.tsv | awk 'BEGIN {FS=OFS="\t"} {print $7, $3, $4, $5}'
which will produce the same result.
CodePudding user response:
Suggesting to user awk
script.
Provide matching pattern as argument inp
.
Filter only lines matching RexExp provided in inp
variable.
Print the fields 7th 3rd 4th 5th.
awk '$0 ~ inp {print $7, $3, $4, $5}' inp="CTRVPISWGSFDIW" input.tsv
You can insert into inp
a bash variable like this:
awk '$0 ~ inp {print $7, $3, $4, $5}' inp="$variable" input.tsv