I have a tabulated .txt
file with three fields, the third field often having multiple words:
GO:0000002 Akt3 mitochondrial genome maintenance
GO:0000002 Mef2a mitochondrial genome maintenance
GO:0000002 Mgme1 mitochondrial genome maintenance
GO:0000002 Mpv17 mitochondrial genome maintenance
GO:0000002 Mrpl15 mitochondrial genome maintenance
I wanted to swap fields 1 and 2 using awk
, but when i run the command:
awk 'BEGIN{OFS="\t"}{print $2,$1,$3;}' file.txt
I get:
Akt3 GO:0000002 mitochondrial
Mef2a GO:0000002 mitochondrial
Mgme1 GO:0000002 mitochondrial
Mpv17 GO:0000002 mitochondrial
Why do i not get all the words in the third field, and how to solve this?
Thanks in advance
CodePudding user response:
There are various ways to solve the problem, but the main issue is that the output field separator has been set, but the default input field separator is both tabs & space. Setting them both to tab should give the output you want (or make it more generalizable by just swapping the first two fields & printing the remainder of the line)
awk 'BEGIN{OFS="\t"; FS="\t";} {print $2,$1,$3;}' file.txt
CodePudding user response:
You are printing 3 fields, that is why you only get 3 fields in the output. You are setting OFS
to a tab but not FS
.
What you can do is flip only field 1 and field 2 using a variable like f1
to hold the field 1 value while switching the values.
Then print the whole line adding 1
after the closing parenthesis to print the whole line. In case you have more columns you don't have to manually specify them when printing.
awk '
BEGIN{FS=OFS="\t"}
{f1 = $1; $1 = $2; $2 = f1;}1
' file.txt
Output
Akt3 GO:0000002 mitochondrial genome maintenance
Mef2a GO:0000002 mitochondrial genome maintenance
Mgme1 GO:0000002 mitochondrial genome maintenance
Mpv17 GO:0000002 mitochondrial genome maintenance
Mrpl15 GO:0000002 mitochondrial genome maintenance