Home > database >  bash / awk : filter value from field
bash / awk : filter value from field

Time:06-12

I am trying to filter column 2 for a range of digits out of the FILEDIGITS.txt.

for i in `seq -f '%0.f\n' 66979300 100 66982300`; do
awk -v var=$i 'BEGIN{FS=OFS="\t"}{$2 == var }{print $0 }' FILEDIGITS.txt >> FILTERED.txt                        
done

Nevertheless no filtering it is happening, the FILTERED.TXT is identical to the FILEDIGITS.TXT.

I checked and the values requested are present in the column 2 of the FILEDIGITS.TXT, filtering should then happen successfully.

Where am I wrong? Many thanks for the help!

CodePudding user response:

Do you know this great page ? Awk - A Tutorial and Introduction - by Bruce Barnett - Grymoire

Give a try to this (EDIT see comment from @AndrejPodzimek):

for i in `seq -f '%0.f\n' 66979300 100 66982300`; do
  awk 'BEGIN {FS=OFS="\t"} ; $2 == var' var="${i}" FILEDIGITS.txt >> FILTERED.txt                        
done

CodePudding user response:

if you're only dealing with 31 of these numbers, might as well chunk it all in one shot :

 . . . input_data . . . | 

 {m,g}awk '!_<NF' FS="^[^\t]*[\t]($(jot -s'|' - 66979300 66982300 100))[\t]"

.

ps : use jot if u can instead of seq

that sub-shell call creates an FS resembling this :

FS: "^[^\t]*[\t](66979300|66979400|66979500|66979600|66979700|
                 66979800|66979900|66980000|66980100|66980200|
                 66980300|66980400|66980500|66980600|66980700|
                 66980800|66980900|66981000|66981100|66981200|
                 66981300|66981400|66981500|66981600|66981700|
                 66981800|66981900|66982000|66982100|66982200|66982300)[\t]"

And you can make all sorts of numeric ranges, such as this :

 FS: "^[^\t]*[\t](11111|15555|19999|24443|28887|33331|37775|42219|
                  46663|51107|55551|59995|64439|68883|73327|77771|
                  82215|86659|91103|95547|99991|""""104435|108879|
                        113323|117767|122211|126655|131099|135543|
                        139987|144431|148875|153319|157763|162207|
                        166651|171095|175539|179983|184427|188871|
                        193315|197759|202203|206647|211091|215535|
                        219979|224423|228867|233311|237755|242199|
                        246643|251087|255531|259975|264419|268863|
                        273307|277751|282195|286639|291083|295527|
                 299971|304415|308859|313303|317747|322191|326635|331079)[\t]"

setting OFS would be superfluous since this is purely a filter without any custom action statements needed.

Once you have that FS in place, then criteria matching comes in the form of 1 < NF

  • Related