Home > Net >  AWK: print column variable with each character separated by a space
AWK: print column variable with each character separated by a space

Time:04-26

I have a very large file like so:

ID      Class     Values
126       1       332222330442022...
753       1       332222330442022...
119       1       402224220402022...
830       1       002233440232022...
944       1       222222220002022...

The 3rd columns is a string with 50,000 characters. I need to ignore the top line, drop the 2nd column, replace all 3 or 4's in the 3rd colum with 1's and finally print the 3rd column with every charcater seperated by a space.

So the desired output is:

126    1 1 2 2 2 2 1 1 0 1 1 2 0 2 2...
753    1 1 2 2 2 2 1 1 0 1 1 2 0 2 2...
119    1 0 2 2 2 1 2 2 0 1 0 2 0 2 2...
830    0 0 2 2 1 1 1 1 0 2 1 2 0 2 2...
944    2 2 2 2 2 2 2 2 0 0 0 2 0 2 2...

Because the file is so large, it would be good to avoid using split on the 3rd column if possible.

So far, I can achieve everything except printing the 3rd column seperated by a space with the following:

awk -F " " 'NR!= 1 { gsub(3,1,$3); gsub(4,1,$3); printf "%s\t%s\n", $1, $3 }' ./input.txt

I know I can use split() similar to the answer here (Split tab delimited column with space) but I need to print $1 also. Is it possible to separate the 3rd column in the same awk command?

CodePudding user response:

You may use this awk:

awk -v OFS='\t' 'NR > 1 {
   gsub(/[34]/, 1, $3)
   gsub(/./, "& ", $3)
   sub(/ $/, "", $3)
   print $1, $3
}' file

126    1 1 2 2 2 2 1 1 0 1 1 2 0 2 2
753    1 1 2 2 2 2 1 1 0 1 1 2 0 2 2
119    1 0 2 2 2 1 2 2 0 1 0 2 0 2 2
830    0 0 2 2 1 1 1 1 0 2 1 2 0 2 2
944    2 2 2 2 2 2 2 2 0 0 0 2 0 2 2
  • Related