How do I extract log entries between two times without date?-CodePudding

I am trying to have an automated script that can take the latest log entry and collect all the log entries from two hours back regardless of if there are log entries that exist during that time. The issue I keep running into for research is that all the examples I find have a date attached and I do not. A sample log output is:


13:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
13:26:28.713687 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9287:9522, ack 13044, win 420, length 235
13:26:28.713766 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], ack 9522, win 24576, length 0
13:26:28.840650 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], seq 14286:15624, ack 9522, win 24576, length 1338
13:26:28.848949 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9522:9599, ack 14286, win 420, length 77
13:26:28.849002 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [P.], seq 15624:15674, ack 9599, win 24576, length 50
13:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
13:26:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

So my time is at the front with no date. And date does not like using this, giving me a date: invalid date ‘ %s’ response and does not output anything. My current work is:

#!/bin/bash

truncate -s 0 twoHour.log

NEW=$(tail -n1 $1 | cut -d ":" -f1)
# echo $NEW
New=$(date -d "$NEW"  %s)
OLD=$(($NEW-2))
New=$(date -d "$OLD"  %s)
# echo $OLD
START=$(egrep "$NEW\:\d\d\:\d\d" $1 | tail | date -d  %s)
END=$(egrep "$OLD\:\d\d\:\d\d" $1 | head | date -d  %s)

while read line; do

    # Extract the date for each line.
    # First strip off everything up to the first "[".
    # Then remove everything after the first "]".
    # Finally, straighten up the format with the cleandate function
    date="${date%%.*}"
    date=$( cleandate "$date" )

    # If the date falls between d1 and d2, print it
    if [[ $date -ge $START && $date -le $END ]]; then
         echo "$line"
    fi

done

NEW and OLD are for the hours that are getting extracted. START and END are the boundaries where everything between the two gets outputted line by line. The $1 is for a log file.

I have been trying to modify bash/awk scripts and searching for any premade ones for several hours now, so I am at a loss on how to get this to work.

CodePudding user response：

sed can be used to extract lines expressed by regexp adresses
/^11:.*$/,/^13:26:28.849031 .*$/p

First address could be further refined by getting the minutes digits and adding to the expression as /^11:(2[6-9]|[3-5][0-9]).*$/,/^13:26:28.849031 .*$/p

last_line=$(tail -n1 test.txt)
end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_hour="${end_time:0:2}"
min_msb="${end_time:3:1}"
min_next=$(($min_msb 1))
min_lsb="${end_time:4:1}"
start_hour=$(($end_hour-2))

if [ "$min_msb" -lt 5 ];then
  min_next=$(($min_msb 1))
else
  min_next=5
fi

sed -rn "/^$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9]).*$/,/^$end_time .*$/p" test.txt

If times span more than 24 h

22:57:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
...
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

Then

last_line=$(tail -n1 test.txt)
#end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_time="${last_line:0:8}"

start_time="$(date -d "$(date -d "$end_time" --iso=seconds) -2 hour" ' %T')"
#echo "$start_time - $end_time"

end_hour="$(printf "%d" ${end_time:0:2})"
min_msb="$(printf "%d" ${end_time:3:1})"
min_lsb="$(printf "%d" ${end_time:4:1})"
start_hour="${start_time:0:2}"

if [ "$min_msb" -lt 5 ];then
  min_next=$(($min_msb 1))
else
  min_next=5
fi

echo "sed expression: /^$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9]).*$/,/^$end_time .*$/p"
sed -rn "/^$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9]).*$/,/^$end_time.*$/p" test.txt

CodePudding user response：

Assumptions:

the offset (2 hours in OP's example) is less than 24 hours
every line starts with a timestamp of the format HH:MM:SS
the log may span multiple days

Plan:

convert offset (eg, 2 hrs) to seconds; we'll call this offset_secs
grab time from last line of file; we'll call this last_time
convert the timestamp to epoch/seconds; we'll call this last_epoch
subtract offset_secs from last_epoch; we'll call this first_epoch
convert first_epoch back into a HH:MM:SS string; we'll call this first_time
to address the file's timestamps spanning multiple midnights we'll save the lines of interest in an array, resetting the array when we find we have another midnight to go
during awk/END processing we print our array of lines to stdout

One GNU awk idea:

$ cat log.awk
BEGIN { FS="." }                                # set input field delimiter to "."

# first line of input is last line of log file; grab time and calculate the offset/start time

NR==1 { last_time   = $1
        last_epoch  = mktime( strftime("%Y %m %d") " " gensub(/:/," ","g",last_time))
        first_epoch = last_epoch - offset_secs
        first_time  = strftime("%H:%M:%S", first_epoch)

        if (first_time > last_time)
           spans_midnight=1
        next
      }

# for the rest of the input lines determine if the time falls within the last "offset_secs"

      { curr_time = $1
        if ( (  spans_midnight && curr_time >= first_time) ||
             (  spans_midnight && curr_time <= last_time)  ||
             ( !spans_midnight && curr_time >= first_time && curr_time <= last_time) )
           lines[  cnt]=$0
        else {                                  # outside the time range so ...
           delete lines                         # delete anything saved up to this point and ...
           cnt=0                                # reset the array index
        }
      }
END   { for (i=1;i<=cnt;i  )                    # print the lines that occurred within the last "offset_secs"
            print lines[i]
      }

NOTE: see GNU awk: Time Functions for more details on the mktime() and strftime() functions

Test #1: last 2 hours; does not span midnight; file spans midnight

$ cat sample.log
22:22:00.896232 IP 104.16.42.63.https  ignore this line
06:22:00.896232 IP 104.16.42.63.https  ignore this line; crossed midnight
07:22:00.896232 IP 104.16.42.63.https  ignore this line
09:23:00.896232 IP 104.16.42.63.https  ignore this line
09:51:49.896232 IP 104.16.42.63.https  ignore this line
09:51:50.896232 IP 104.16.42.63.https  keep this line
10:24:37.896232 IP 104.16.42.63.https  keep this line
11:51:50.896232 IP 104.16.42.63.https  keep this line

$ offset_secs=$((2*60*60))                   # 2 hours

$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
09:51:50.896232 IP 104.16.42.63.https  keep this line
10:24:37.896232 IP 104.16.42.63.https  keep this line
11:51:50.896232 IP 104.16.42.63.https  keep this line

Test #2: last 4 hours; spans midnight; file spans multiple midnights

$ cat sample.log
20:22:00.896232 IP 104.16.42.63.https  ignore this line
23:22:00.896232 IP 104.16.42.63.https  ignore this line
01:22:00.896232 IP 104.16.42.63.https  ignore this line; crossed midnight
23:22:00.896232 IP 104.16.42.63.https  ignore this line
01:22:00.896232 IP 104.16.42.63.https  ignore this line; crossed midnight
06:22:00.896232 IP 104.16.42.63.https  ignore this line
07:22:00.896232 IP 104.16.42.63.https  ignore this line
09:23:00.896232 IP 104.16.42.63.https  ignore this line
22:51:49.896232 IP 104.16.42.63.https  ignore this line
22:51:50.896232 IP 104.16.42.63.https  keep this line
23:07:37.896232 IP 104.16.42.63.https  keep this line
00:51:50.896232 IP 104.16.42.63.https  keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https  keep this line
02:51:50.896232 IP 104.16.42.63.https  keep this line

$ offset_secs=$((4*60*60))                   # 4 hours

$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
22:51:50.896232 IP 104.16.42.63.https  keep this line
23:07:37.896232 IP 104.16.42.63.https  keep this line
00:51:50.896232 IP 104.16.42.63.https  keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https  keep this line
02:51:50.896232 IP 104.16.42.63.https  keep this line