How to grep timestamp and specific value by key from log file-CodePudding

I was trying to grep a value by key from log4j file, the script works fine, is there anyway to include the timestamp to the output as well?

My script is:

awk '/duration=/ {print $4}' mylog.log | awk '{print substr($1, 10, length($1)-10)}' | less

the log file is:

 WARN [20.07.22 00:02:43.647] TaskManager .process() Processing task took longer than set threshold
    |-|  |-| delayEvent=DelayEvent{eventName='UpdateUser', duration=65, endTime=1658275363647, averageValue=0, maxValue=637} | delayThreshold=50
    |-| nanos=45293222359715481 | threadId=523 | threadName=ThreadManager | timestamp=1658275363647 | mid=nag1uu-1#42616
    |-| layer=BAA |-| [ SISS]

 WARN [20.07.22 00:02:44.689] TaskManager .process() Processing task took longer than set threshold
    |-|  |-| delayEvent=DelayEvent{eventName='UpdateUser', duration=88, endTime=1658275364689, averageValue=0, maxValue=637} | delayThreshold=50
    |-| nanos=45293223401808770 | threadId=523 | threadName=ThreadManager | timestamp=1658275364689 | mid=nag1uu-1#42616
    |-| layer=BAA |-| [ SISS]

output is

65
88

My expected result is:

20.07.22 00:02:43.647     65
20.07.22 00:02:44.689     88

Is there anyway to achieve this? Thanks a lot in advance.

CodePudding user response：

With your shown samples please try following awk code. Written and tested in GNU awk.

awk -v RS='\\[([0-9]{2}\\.){2}[0-9]{2} ([0-9]{2}:){2}[0-9]{2}\\.[0-9]{3}[^\n]*\n[^\n]*duration=[0-9] ' '
RT{
  num=split(RT,arr,"[][,]")
  sub(/duration=/,"",arr[num])
  print arr[2],arr[num]
}
' Input_file

OR with only using split(not using sub in getting actual values in RT) try following above code which is minor tweak of above awk code.

awk -v RS='\\[([0-9]{2}\\.){2}[0-9]{2} ([0-9]{2}:){2}[0-9]{2}\\.[0-9]{3}[^\n]*\n[^\n]*duration=[0-9] ' '
RT{
  num=split(RT,arr,"[][,]|duration=")
  print arr[2],arr[num]
}
' Input_file

Explanation of regex:

\\[([0-9]{2}\\.){2}[0-9]{2}         ##Matching literal [ followed by (2 digits followed by dot)
                                      and this combination 2 times followed by 2 digits.
 ([0-9]{2}:){2}[0-9]{2}\\.[0-9]{3}  ##Matching space followed by (2 digits followed by colon) and
                                      this combination 2 times followed by 2 digits followed by dot followed by 3 digits.
[^\n]*\n[^\n]*duration=[0-9]        ##Matching everything until new line comes followed by new line
                                      then match everything before newline till duration= digits as per requirement.

CodePudding user response：

The first Awk script is throwing away that information; but presumably what you want can be obtained by refactoring everything into a single Awk script, like it should have been done in the first place.

awk '/^[^ \t]/ { sub(/^\[/, "", $2); sub(/\]$/, "", $3); when=$2 " " $3}
  $4 ~ /^duration=/ {print when "\t" substr($4, 10, length($4)-10)}' mylog.log

CodePudding user response：

I would exploit GNU AWK's paragraph mode for this task following way, let file.txt content be

 WARN [20.07.22 00:02:43.647] TaskManager .process() Processing task took longer than set threshold
    |-|  |-| delayEvent=DelayEvent{eventName='UpdateUser', duration=65, endTime=1658275363647, averageValue=0, maxValue=637} | delayThreshold=50
    |-| nanos=45293222359715481 | threadId=523 | threadName=ThreadManager | timestamp=1658275363647 | mid=nag1uu-1#42616
    |-| layer=BAA |-| [ SISS]

 WARN [20.07.22 00:02:44.689] TaskManager .process() Processing task took longer than set threshold
    |-|  |-| delayEvent=DelayEvent{eventName='UpdateUser', duration=88, endTime=1658275364689, averageValue=0, maxValue=637} | delayThreshold=50
    |-| nanos=45293223401808770 | threadId=523 | threadName=ThreadManager | timestamp=1658275364689 | mid=nag1uu-1#42616
    |-| layer=BAA |-| [ SISS]

then

awk 'BEGIN{RS="";FS="[\]\[]"}match($0,/duration=[[:digit:]] /){print $2, substr($0, RSTART 9, RLENGTH-9)}' file.txt

gives output

20.07.22 00:02:43.647 65
20.07.22 00:02:44.689 88

Explanation: I set RS to empty string to activate paragraph mode - now everything between blank lines is considered to be single row and field separator to be literal [ or literal ]. For every row containing duration= followed by 1 or more digits I print 2nd field (timestamp) followed by substring, I calucate start of it and length based on where is match, as duration= has 9 characters I offset start and length by that value.

(tested in gawk 4.2.1)

CodePudding user response：

GNU Awk

Using delimiter = and , to get fields duration=$5, endTime=$7,

awk -F '[=,]' '
    /duration=/{
        printf "%s.%s\t%s\n",
            # print date.milliseconds<tab>duration 
            strftime("%d.%m.%y %H:%M:%S", substr($7,1,length($7)-3),1), 
            # or strftime("%d.%m.%y %H:%M:%S", $7/1000, 1),
            # convert Timestamp to date
            substr($7,length($7)-2),$5
            # add milliseconds & duration                          
}' mylog.log

20.07.22 00:02:43.647   65
20.07.22 00:02:44.689   88

Other..

awk -F '[\\[\\]=,]' '/TaskManager/{tsp=$2}/duration/{print tsp"\t"$5}' mylog.log

20.07.22 00:02:43.647   65
20.07.22 00:02:44.689   88

CodePudding user response：

Using any awk:

$ awk -v RS= -F'[][,= ] ' '{print $3, $4, $20}' mylog.log
20.07.22 00:02:43.647 65
20.07.22 00:02:44.689 88

If that's not all you need then edit your question to provide more realistic sample input.