why is awk printing only first word in a field?-CodePudding

I have a tabulated .txt file with three fields, the third field often having multiple words:

GO:0000002  Akt3    mitochondrial genome maintenance
GO:0000002  Mef2a   mitochondrial genome maintenance
GO:0000002  Mgme1   mitochondrial genome maintenance
GO:0000002  Mpv17   mitochondrial genome maintenance
GO:0000002  Mrpl15  mitochondrial genome maintenance

I wanted to swap fields 1 and 2 using awk, but when i run the command:

awk 'BEGIN{OFS="\t"}{print $2,$1,$3;}' file.txt

I get:

Akt3    GO:0000002  mitochondrial
Mef2a   GO:0000002  mitochondrial
Mgme1   GO:0000002  mitochondrial
Mpv17   GO:0000002  mitochondrial

Why do i not get all the words in the third field, and how to solve this?

Thanks in advance

CodePudding user response：

There are various ways to solve the problem, but the main issue is that the output field separator has been set, but the default input field separator is both tabs & space. Setting them both to tab should give the output you want (or make it more generalizable by just swapping the first two fields & printing the remainder of the line)

awk 'BEGIN{OFS="\t"; FS="\t";} {print $2,$1,$3;}' file.txt

CodePudding user response：

You are printing 3 fields, that is why you only get 3 fields in the output. You are setting OFS to a tab but not FS.

What you can do is flip only field 1 and field 2 using a variable like f1 to hold the field 1 value while switching the values.

Then print the whole line adding 1 after the closing parenthesis to print the whole line. In case you have more columns you don't have to manually specify them when printing.

awk '
BEGIN{FS=OFS="\t"}
{f1 = $1; $1 = $2; $2 = f1;}1
' file.txt

Output

Akt3    GO:0000002      mitochondrial genome maintenance
Mef2a   GO:0000002      mitochondrial genome maintenance
Mgme1   GO:0000002      mitochondrial genome maintenance
Mpv17   GO:0000002      mitochondrial genome maintenance
Mrpl15  GO:0000002      mitochondrial genome maintenance