I keep daily log files (like logfile-2022-01-01.log, logfile-2022-01-02.log, and so on). Every line on the files starts with a timestamp, e.g: [2022-05-01 10:00:34.550] ...some strings.... --> this being YYYY-MM-DD HH:MM:SS.sss
I need to filter all the lines between two timestamps, this could mean search in more than one file. For instance:
logfile-2022-01-01.log
[2022-01-01 00:00:25.550] here comes some logging info
[2022-01-01 00:02:25.550] here comes some more logging info
....
[2022-01-01 23:58:29.480] here comes some more logging info
logfile-2022-01-02.log
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day
....
[2022-01-02 23:59:29.480] here comes some more logging info from the next day
I wish to extract the lines between 2022-01-01 20:00:00 (this is contained in the first file) and 2022-01-02 08:00:00 (this is contained in the second file).
I'm expecting to get something like this:
[2022-01-01 23:58:29.480] here comes some more logging info
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day
Any ideas on how to achieve this?
So far I've tried using this:
grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk '/^2022-01-01 20:00/,/^2022-01-02 08:00/ {print}'
grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk '$1" "$2 > "2022-01-01 20:00" && $1" "$2 < "2022-01-02 08:00"'
grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk -v beg='2022-01-01 20:00' -v end='2022-01-02 08:00' '{cur=$1" "$2} beg<=cur && cur<=end'
Both run without errors but didn't print anything
CodePudding user response:
Adding some lines to both input files so we can confirm matching on specific strings; also updating file names to match the timestamp date (05
instead of 01
):
$ head logfile*
==> logfile-2022-05-01.log <==
[2022-05-01 00:00:25.550] here comes some logging info
[2022-05-01 00:02:25.550] here comes some more logging info
[2022-05-01 23:56:30.332] here comes more logging info
[2022-05-01 23:58:29.480] here comes some more logging info
==> logfile-2022-05-02.log <==
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:02:39.224] here comes logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day
Tweaking one of OP's current set of code:
$ cat logfile-2022-05-01.log logfile-2022-05-02.log | grep "here comes some" | awk -F'[][]' '$2 >= "2022-05-01 20:00" && $2 <= "2022-05-02 08:00"'
Where:
- replace first
grep
withcat
- add
awk
dual field delimiter of]
and[
- modify
awk
to only compare the 2nd field - modify
awk
tests to use inclusive ranges - update file names and datetime stamps for May (
05
) instead of Jan (01
)
This generates:
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day
While this generates OP's desired results (per comment OP has stated duplicate lines are ok), once you decide to use awk
there's typically no need for separate cat
and grep
calls.
One unified awk
idea that utilizes input variables while also removing duplicate (consecutive) lines:
start='2022-05-01 20:00:00'
end='2022-05-02 08:00:00'
string='here comes some'
awk -F'[][]' -v start="$start" -v end="$end" -v str="$string" '
$2 >= start { printme=1 }
$2 > end ".000" { printme=0 } # assumes "end" does not include milliseconds
printme && $0 ~ str { if ($0==last) next # skip duplicate consecutive lines
print last=$0
}
' logfile-2022-05-??.log
This generates:
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day