Home > Back-end >  awk: get log data part by part
awk: get log data part by part

Time:10-24

the log file is

Oct 01 [time] a
Oct 02 [time] b
Oct 03 [time] c
.
.
.
Oct 04 [time] d
Oct 05 [time] e
Oct 06 [time] f
.
.
.
Oct 28 [time] g
Oct 29 [time] h
Oct 30 [time] i

and it is really big ( millions of lines )

I wanna to get logs between Oct 01 and Oct 30

I can do it with gawk

gawk 'some conditions' filter.log

and it works correctly.

and it return millions of log lines that is not good

because I wanna to get it part by part

some thing like this

gawk 'some conditions' -limit 100 -offset 200 filter.log

and every time when I change limit and offset

I can get another part of that.

How can I do that ?

CodePudding user response:

Using OP's pseudo code mixed with some actual awk code:

gawk -v limit=100 -v offset=200 '
some conditions { matches                                  # track number of matches
                  if (matches >= offset and limit > 0) {
                     print                                 # print current line
                     limit--                               # decrement limit
                  }
                  if (limit == 0) exit                     # optional: abort processing if we found "limit" number of matches
                }
' filter.log

CodePudding user response:

awk solution I would harness GNU AWK for this task following way, let file.txt content be

1
2
3
4
5
6
7
8
9

and say I want to print such lines that 1st field is odd in part starting at 3th line and ending at 7th line (inclusive), then I can use GNU AWK following way

awk 'NR<3{next}$1%2{print}NR>=7{exit}' file.txt

which will give

3
5
7

Explanation: NR is built-in variable, which hold number of row, when processing lines before 3 just go to next row without doing anything, when remainder from division by 2 is non-zero do print line, when processing 7th or further row just exit. Using exit might give notice boost in performance if you are processing relatively small part of file. Observe order of 3 pattern-action pairs in code above: next is first, then whatever you do want do, exit is last. If you want to know more about NR read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

(tested in GNU Awk 5.0.1)

linux solution If you prefer working with offset limit, then you might exploit tail-head combination e.g. for above file.txt

tail -n  5 file.txt | head -3

gives output

5
6
7

observe that offset goest first and with before value then limit with - before value.

  • Related