I'm trying to reformat column 4 in "orthologsClassification.tsv" file to include everything except the text between the two periods. I want:
t_gene t_transcript q_gene q_transcript
ENSG00000213096 ENST00000616028.ZNF254 reg_2133 ENST00000616028.ZNF254.2177
ENSG00000213096 ENST00000616028.ZNF254 reg_2053 ENST00000616028.ZNF254.2637
to become:
t_gene t_transcript q_gene q_transcript
ENSG00000213096 ENST00000616028.ZNF254 reg_2133 ENST00000616028.2177
ENSG00000213096 ENST00000616028.ZNF254 reg_2053 ENST00000616028.2637
I've already tried:
awk '{sub(/\..*$/, "", $4)} 1' OFS='\t' orthologsClassification.tsv
but this deletes everything including and after the first period in column 4 (so ENST00000616028.ZNF254.2177 becomes ENST00000616028, when I really want ENST00000616028.2177).
Any ideas? Thank you!
CodePudding user response:
$ awk -F. 'BEGIN{OFS=FS}NR>1{$--NF=$NF}1' file
t_gene t_transcript q_gene q_transcript
ENSG00000213096 ENST00000616028.ZNF254 reg_2133 ENST00000616028.2177
ENSG00000213096 ENST00000616028.ZNF254 reg_2053 ENST00000616028.2637
CodePudding user response:
One awk
idea using a regex:
$ awk 'BEGIN{FS=OFS="\t"} FNR>1 {sub(/\.[^.]*\./,".",$4)} 1' orthologsClassification.tsv
t_gene t_transcript q_gene q_transcript
ENSG00000213096 ENST00000616028.ZNF254 reg_2133 ENST00000616028.2177
ENSG00000213096 ENST00000616028.ZNF254 reg_2053 ENST00000616028.2637
Another awk
idea using split
on $4
to extract the 1st and 3rd period-delimited subfields:
$ awk 'BEGIN{FS=OFS="\t"} FNR>1 {split($4,a,".");$4=a[1]"."a[3]} 1' orthologsClassification.tsv
t_gene t_transcript q_gene q_transcript
ENSG00000213096 ENST00000616028.ZNF254 reg_2133 ENST00000616028.2177
ENSG00000213096 ENST00000616028.ZNF254 reg_2053 ENST00000616028.2637
CodePudding user response:
Given the example you provided, all you need is:
$ sed 's/\.[^\t]*\././' file
t_gene t_transcript q_gene q_transcript
ENSG00000213096 ENST00000616028.ZNF254 reg_2133 ENST00000616028.2177
ENSG00000213096 ENST00000616028.ZNF254 reg_2053 ENST00000616028.2637
CodePudding user response:
awk -F'\.[^ \t]*\.' NF=NF OFS=.