Home > Enterprise >  How to sample multiple columns in the 2D matrix by using awk with for loop?
How to sample multiple columns in the 2D matrix by using awk with for loop?

Time:10-10

I would like to sample multiple columns in the 2D matrix by using 'awk'.

For instance,

awk -F " " '{print $900, $925, $950, $975 $1000}' [filename].txt > test.txt

I just wrote five columns in the above command for example. In fact the number of columns would be over 40. The column number has a increment, 25, from starting number, $900.

Writing all $(column number) would be painful.

How could I make the command simpler by using for loop? Or Any other suggestion?

Thank you for reading this question.

CodePudding user response:

I would harness GNU AWK for this task following way, let file.txt content be

A B C D E F
AA BB CC DD EE FF
AAA BBB CCC DDD EEE FFF

and say I want to get odd columns starting at 1 that is 1, 3, 5 then

awk 'BEGIN{pitch=2}{for(i=1;i<=NF;i =pitch){printf "%s%s",$i,(i pitch>NF?"\n":" ")}}' file.txt

gives output

A C E
AA CC EE
AAA CCC EEE

Explanation: I do use for loop with increment by pitch which is 2 in example, starting is from 1 and condition is i less equal number of fields (NF), in each turn of loop I use printf once, first element is simply value of ith column ($i), second is newline (\n) - if said element is last in given line or space for all other cases. I use number of current column (i), pitch and number of columns (NF) to calcute if this is last value to be included in current line and then so-called ternary operator condition?valueiftrue:valueiffalse to select fitting character.

(tested in gawk 4.2.1)

CodePudding user response:

jot -s ' ' -w 'Col-%d' 2000 | 

mawk '{ print '"$( jot -s ', ' -w '$%d' 40 900 - 25 )"'"" }'
 1  Col-900
    Col-925
    Col-950
    Col-975
    Col-1000
    Col-1025
    Col-1050
    Col-1075
    Col-1100
    Col-1125
    Col-1150
    Col-1175
    Col-1200
    Col-1225
    Col-1250
    Col-1275
    Col-1300
    Col-1325
    Col-1350
    Col-1375
    Col-1400
    Col-1425
    Col-1450
    Col-1475
    Col-1500
    Col-1525
    Col-1550
    Col-1575
    Col-1600
    Col-1625
    Col-1650
    Col-1675
    Col-1700
    Col-1725
    Col-1750
    Col-1775
    Col-1800
    Col-1825
    Col-1850
    Col-1875

The trick is to use jot (or seq, or something similar) to dynamically generate code with hard-coded column #s :

for the example above, this code is being generated on the fly :

mawk '{
    print $900,  $925,  $950,  $975, $1000, $1025, $1050, $1075,
         $1100, $1125, $1150, $1175, $1200, $1225, $1250, $1275,
         $1300, $1325, $1350, $1375, $1400, $1425, $1450, $1475,
         $1500, $1525, $1550, $1575, $1600, $1625, $1650, $1675,
         $1700, $1725, $1750, $1775, $1800, $1825, $1850, $1875, "" }'
  • Related