Home > other >  Unix get lines between timestamps on multiple files
Unix get lines between timestamps on multiple files

Time:12-28

I keep daily log files (like logfile-2022-01-01.log, logfile-2022-01-02.log, and so on). Every line on the files starts with a timestamp, e.g: [2022-05-01 10:00:34.550] ...some strings.... --> this being YYYY-MM-DD HH:MM:SS.sss

I need to filter all the lines between two timestamps, this could mean search in more than one file. For instance:

logfile-2022-01-01.log
[2022-01-01 00:00:25.550] here comes some logging info
[2022-01-01 00:02:25.550] here comes some more logging info
....
[2022-01-01 23:58:29.480] here comes some more logging info

logfile-2022-01-02.log
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day
....
[2022-01-02 23:59:29.480] here comes some more logging info from the next day

I wish to extract the lines between 2022-01-01 20:00:00 (this is contained in the first file) and 2022-01-02 08:00:00 (this is contained in the second file).
I'm expecting to get something like this:

[2022-01-01 23:58:29.480] here comes some more logging info
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day

Any ideas on how to achieve this?

So far I've tried using this:

grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk '/^2022-01-01 20:00/,/^2022-01-02 08:00/ {print}'

grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk '$1" "$2 > "2022-01-01 20:00" && $1" "$2 < "2022-01-02 08:00"'

grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk -v beg='2022-01-01 20:00' -v end='2022-01-02 08:00' '{cur=$1" "$2} beg<=cur && cur<=end'

Both run without errors but didn't print anything

CodePudding user response:

Adding some lines to both input files so we can confirm matching on specific strings; also updating file names to match the timestamp date (05 instead of 01):

$ head  logfile*
==> logfile-2022-05-01.log <==
[2022-05-01 00:00:25.550] here comes some logging info
[2022-05-01 00:02:25.550] here comes some more logging info
[2022-05-01 23:56:30.332] here comes more logging info
[2022-05-01 23:58:29.480] here comes some more logging info


==> logfile-2022-05-02.log <==
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:02:39.224] here comes logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day

Tweaking one of OP's current set of code:

$ cat logfile-2022-05-01.log logfile-2022-05-02.log | grep "here comes some" | awk -F'[][]' '$2 >= "2022-05-01 20:00" && $2 <= "2022-05-02 08:00"'

Where:

  • replace first grep with cat
  • add awk dual field delimiter of ] and [
  • modifyawk to only compare the 2nd field
  • modify awk tests to use inclusive ranges
  • update file names and datetime stamps for May (05) instead of Jan (01)

This generates:

[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day

While this generates OP's desired results (per comment OP has stated duplicate lines are ok), once you decide to use awk there's typically no need for separate cat and grep calls.

One unified awk idea that utilizes input variables while also removing duplicate (consecutive) lines:

start='2022-05-01 20:00:00'
end='2022-05-02 08:00:00'
string='here comes some'

awk -F'[][]' -v start="$start" -v end="$end" -v str="$string" '

$2 >= start         { printme=1 }
$2 > end ".000"     { printme=0 }                # assumes "end" does not include milliseconds

printme && $0 ~ str { if ($0==last) next         # skip duplicate consecutive lines
                      print last=$0
                    }

' logfile-2022-05-??.log

This generates:

[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day
  • Related