Home > Software engineering >  bash: date comparison and result steering within jq selection
bash: date comparison and result steering within jq selection

Time:07-29

I am selecting some data from a text log file in bash on Debian with the following command:

cat /mnt/WD1003/logs/sn1.log | grep 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE | \
grep -E "GET_AUDIT" | jq -R '. | split("\t") | (.[4] | fromjson) as $body |
{SatelliteID: $body."Satellite ID", ($body."Piece ID"): {(.[0]): .[3]}}' | \
jq -s 'reduce .[] as $item ({}; . * $item)'

Example "raw" output from the text log file:

2022-07-26T15:03:10.670Z    INFO    piecestore  download started    {"Process": "storagenode", "Piece ID": "DZ4HPUJE7IFLPABM47L3B5MXFIV3L5P2IBM32CXUJBYUOMJBNJBQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:03:10.893Z    INFO    piecestore  downloaded  {"Process": "storagenode", "Piece ID": "DZ4HPUJE7IFLPABM47L3B5MXFIV3L5P2IBM32CXUJBYUOMJBNJBQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.285Z    INFO    piecestore  download started    {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.493Z    INFO    piecestore  downloaded  {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}

The current result is the following:

{
  "SatelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE",
  "NG6KAUMU7TP22DNGROKBU2MRRNV675QYEOJC3X2BXH4OCML6BPNQ": {
    "2022-06-28T21:24:39.646Z": "download started",
    "2022-06-28T21:24:40.002Z": "downloaded"
  },
  "IADTQX62PCZQEJRRYPCKNWX3QSPG7A3U53IBWPQRSX6ZMH6I45UQ": {
    "2022-06-28T21:32:40.597Z": "download started",
    "2022-06-28T21:32:40.893Z": "downloaded",
    "2022-07-09T20:00:10.698Z": "download started",
    "2022-07-09T24:00:10.995Z": "downloaded"
  },
  "MZEPH4JSGSAJZ72QQV4YOYYVGLER7KOQPBUB2VEANL4MPNSZDBTA": {
    "2022-06-28T21:58:56.184Z": "download started",
    "2022-06-28T22:01:26.454Z": "downloaded"
  },
  "GFATHGO2WFBZNAOQJKXYNHTFKH2T5T4OXK3BEL7U62FNK5ZRR6OQ": {
    "2022-06-28T22:08:49.765Z": "download started",
    "2022-06-28T22:08:50.089Z": "downloaded"
  },
...
}

I only need to have a result of the jq-query above, if the difference between a "started" and "downloaded" (or "download failed" or "download canceled") is larger than 3 minutes. If there is none, the result shall be empty.

So the target result should look like:

{
  "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE": "2",
  "anotherSatelliteID": "1"
}

Whereof the number indicates the count of "time lags" larger 3 minutes between two timestamps per satelliteID.

There are a couple of satellites, so in the example result above we have 2 satellites having issues with 5 respective 2 time lag alerts.

One additional remark: the command should run on MacOS, too.

Please help advising how I can do that.

Update #1:

I've found another example from multiple downloads. I do not expect that a second download can start before the first has finished - but in that case, the PieceID should be skipped.

$ cat /mnt/WD1003/logs/sn1.log | grep CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ
2022-07-24T02:37:47.570Z    INFO    piecestore  download started    {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-24T02:37:47.815Z    INFO    piecestore  downloaded  {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.285Z    INFO    piecestore  download started    {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.493Z    INFO    piecestore  downloaded  {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}

CodePudding user response:

Here's a solution with jq. As I'm not sure that the PieceIDs are unique I prepended the SatelliteID to it.

jq -Rn '
    reduce (
        inputs / "\t" |
        .[4] |= fromjson |
        select(.[4].Action == "GET_AUDIT") |
        [
            ( .[0] | sub("\\.\\d Z$"; "Z") | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime ),
            .[3],
            .[4]."Satellite ID",
            ( .[4]."Satellite ID"   "."   .[4]."Piece ID" )
        ]
    ) as [ $time, $event, $id, $address ] (
        [{},{}];
        if $event == "download started"
        then
            .[1][$address] = $time
        else
            if (.[1] | has($address) | not) or ($time - .[1][$address]) / 60 > 3
            then
                .[0][$id]  = 1
            else
                .
            end
        end
    ) |
    .[0]
' /mnt/WD1003/logs/sn1.log

Given the following log file:

note: I truncated the IDs to 8 chars, removed the irrelevant JSON keys, and marked the unused fields with -. You should have posted your input data in this simplified form :-P

2022-07-24T02:37:47.570Z    -   -   download started    {"Piece ID": "CDTQRMUZ", "Satellite ID": "1wFTAgs9", "Action": "GET_AUDIT"}
2022-07-24T02:37:47.770Z    -   -   download started    {"Piece ID": "CDTQRMUZ", "Satellite ID": "1wFTAgs9", "Action": "GET_AUDIT"}
2022-07-24T02:37:47.815Z    -   -   downloaded  {"Piece ID": "CDTQRMUZ", "Satellite ID": "1wFTAgs9", "Action": "GET_AUDIT"}
2022-07-24T02:37:48.107Z    -   -   downloaded  {"Piece ID": "CDTQRMUZ", "Satellite ID": "1wFTAgs9", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.285Z    -   -   download started    {"Piece ID": "CDTQRMUZ", "Satellite ID": "IADTQX62", "Action": "GET_AUDIT"}
2022-07-26T15:48:13.362Z    -   -   download started    {"Piece ID": "GFATHGO2", "Satellite ID": "4EXtmN5f", "Action": "GET_AUDIT"}
2022-07-26T15:48:13.404Z    -   -   other   {}
2022-07-26T15:48:13.693Z    -   -   downloaded  {"Piece ID": "GFATHGO2", "Satellite ID": "4EXtmN5f", "Action": "GET_AUDIT"}
2022-07-26T15:51:23.789Z    -   -   downloaded  {"Piece ID": "CDTQRMUZ", "Satellite ID": "IADTQX62", "Action": "GET_AUDIT"}
2022-07-26T16:00:00.000Z    -   -   downloaded  {"Piece ID": "MREMD5SU", "Satellite ID": "2LYZZEPV", "Action": "GET_AUDIT"}

The first (...) of reduce should yield something like:

[ 1658633867, "download started", "1wFTAgs9", "1wFTAgs9.CDTQRMUZ" ]
[ 1658633867, "download started", "1wFTAgs9", "1wFTAgs9.CDTQRMUZ" ]
[ 1658633867, "downloaded",       "1wFTAgs9", "1wFTAgs9.CDTQRMUZ" ]
[ 1658633868, "downloaded",       "1wFTAgs9", "1wFTAgs9.CDTQRMUZ" ]
[ 1658854032, "download started", "IADTQX62", "IADTQX62.CDTQRMUZ" ]
[ 1658854093, "download started", "4EXtmN5f", "4EXtmN5f.GFATHGO2" ]
[ 1658854093, "downloaded",       "4EXtmN5f", "4EXtmN5f.GFATHGO2" ]
[ 1658854283, "downloaded",       "IADTQX62", "IADTQX62.CDTQRMUZ" ]
[ 1658854800, "downloaded",       "2LYZZEPV", "2LYZZEPV.MREMD5SU" ]

note: The other event is filtered out because it doesn't contain a GET_ACTION key

And the final result (with the current logic) would be :

{
  "IADTQX62": 1,
  "2LYZZEPV": 1
}
  • Related