I am selecting some data from a text log file in bash on Debian with the following command:
cat /mnt/WD1003/logs/sn1.log | grep 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE | \
grep -E "GET_AUDIT" | jq -R '. | split("\t") | (.[4] | fromjson) as $body |
{SatelliteID: $body."Satellite ID", ($body."Piece ID"): {(.[0]): .[3]}}' | \
jq -s 'reduce .[] as $item ({}; . * $item)'
Example "raw" output from the text log file:
2022-07-26T15:03:10.670Z INFO piecestore download started {"Process": "storagenode", "Piece ID": "DZ4HPUJE7IFLPABM47L3B5MXFIV3L5P2IBM32CXUJBYUOMJBNJBQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:03:10.893Z INFO piecestore downloaded {"Process": "storagenode", "Piece ID": "DZ4HPUJE7IFLPABM47L3B5MXFIV3L5P2IBM32CXUJBYUOMJBNJBQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.285Z INFO piecestore download started {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.493Z INFO piecestore downloaded {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
The current result is the following:
{
"SatelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE",
"NG6KAUMU7TP22DNGROKBU2MRRNV675QYEOJC3X2BXH4OCML6BPNQ": {
"2022-06-28T21:24:39.646Z": "download started",
"2022-06-28T21:24:40.002Z": "downloaded"
},
"IADTQX62PCZQEJRRYPCKNWX3QSPG7A3U53IBWPQRSX6ZMH6I45UQ": {
"2022-06-28T21:32:40.597Z": "download started",
"2022-06-28T21:32:40.893Z": "downloaded",
"2022-07-09T20:00:10.698Z": "download started",
"2022-07-09T24:00:10.995Z": "downloaded"
},
"MZEPH4JSGSAJZ72QQV4YOYYVGLER7KOQPBUB2VEANL4MPNSZDBTA": {
"2022-06-28T21:58:56.184Z": "download started",
"2022-06-28T22:01:26.454Z": "downloaded"
},
"GFATHGO2WFBZNAOQJKXYNHTFKH2T5T4OXK3BEL7U62FNK5ZRR6OQ": {
"2022-06-28T22:08:49.765Z": "download started",
"2022-06-28T22:08:50.089Z": "downloaded"
},
...
}
I only need to have a result of the jq-query above, if the difference between a "started" and "downloaded" (or "download failed" or "download canceled") is larger than 3 minutes. If there is none, the result shall be empty.
So the target result should look like:
{
"1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE": "2",
"anotherSatelliteID": "1"
}
Whereof the number indicates the count of "time lags" larger 3 minutes between two timestamps per satelliteID.
There are a couple of satellites, so in the example result above we have 2 satellites having issues with 5 respective 2 time lag alerts.
One additional remark: the command should run on MacOS, too.
Please help advising how I can do that.
Update #1:
I've found another example from multiple downloads. I do not expect that a second download can start before the first has finished - but in that case, the PieceID should be skipped.
$ cat /mnt/WD1003/logs/sn1.log | grep CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ
2022-07-24T02:37:47.570Z INFO piecestore download started {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-24T02:37:47.815Z INFO piecestore downloaded {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.285Z INFO piecestore download started {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.493Z INFO piecestore downloaded {"Process": "storagenode", "Piece ID": "CDTQRMUZITFKKCKWTUHGNCVWE2LYZZEPVELC6ADPMREMD5SURZVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT"}
CodePudding user response:
Here's a solution with jq
. As I'm not sure that the PieceIDs are unique I prepended the SatelliteID to it.
jq -Rn '
reduce (
inputs / "\t" |
.[4] |= fromjson |
select(.[4].Action == "GET_AUDIT") |
[
( .[0] | sub("\\.\\d Z$"; "Z") | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime ),
.[3],
.[4]."Satellite ID",
( .[4]."Satellite ID" "." .[4]."Piece ID" )
]
) as [ $time, $event, $id, $address ] (
[{},{}];
if $event == "download started"
then
.[1][$address] = $time
else
if (.[1] | has($address) | not) or ($time - .[1][$address]) / 60 > 3
then
.[0][$id] = 1
else
.
end
end
) |
.[0]
' /mnt/WD1003/logs/sn1.log
Given the following log file:
note: I truncated the IDs to 8 chars, removed the irrelevant JSON keys, and marked the unused fields with -
. You should have posted your input data in this simplified form :-P
2022-07-24T02:37:47.570Z - - download started {"Piece ID": "CDTQRMUZ", "Satellite ID": "1wFTAgs9", "Action": "GET_AUDIT"}
2022-07-24T02:37:47.770Z - - download started {"Piece ID": "CDTQRMUZ", "Satellite ID": "1wFTAgs9", "Action": "GET_AUDIT"}
2022-07-24T02:37:47.815Z - - downloaded {"Piece ID": "CDTQRMUZ", "Satellite ID": "1wFTAgs9", "Action": "GET_AUDIT"}
2022-07-24T02:37:48.107Z - - downloaded {"Piece ID": "CDTQRMUZ", "Satellite ID": "1wFTAgs9", "Action": "GET_AUDIT"}
2022-07-26T15:47:12.285Z - - download started {"Piece ID": "CDTQRMUZ", "Satellite ID": "IADTQX62", "Action": "GET_AUDIT"}
2022-07-26T15:48:13.362Z - - download started {"Piece ID": "GFATHGO2", "Satellite ID": "4EXtmN5f", "Action": "GET_AUDIT"}
2022-07-26T15:48:13.404Z - - other {}
2022-07-26T15:48:13.693Z - - downloaded {"Piece ID": "GFATHGO2", "Satellite ID": "4EXtmN5f", "Action": "GET_AUDIT"}
2022-07-26T15:51:23.789Z - - downloaded {"Piece ID": "CDTQRMUZ", "Satellite ID": "IADTQX62", "Action": "GET_AUDIT"}
2022-07-26T16:00:00.000Z - - downloaded {"Piece ID": "MREMD5SU", "Satellite ID": "2LYZZEPV", "Action": "GET_AUDIT"}
The first (...)
of reduce
should yield something like:
[ 1658633867, "download started", "1wFTAgs9", "1wFTAgs9.CDTQRMUZ" ]
[ 1658633867, "download started", "1wFTAgs9", "1wFTAgs9.CDTQRMUZ" ]
[ 1658633867, "downloaded", "1wFTAgs9", "1wFTAgs9.CDTQRMUZ" ]
[ 1658633868, "downloaded", "1wFTAgs9", "1wFTAgs9.CDTQRMUZ" ]
[ 1658854032, "download started", "IADTQX62", "IADTQX62.CDTQRMUZ" ]
[ 1658854093, "download started", "4EXtmN5f", "4EXtmN5f.GFATHGO2" ]
[ 1658854093, "downloaded", "4EXtmN5f", "4EXtmN5f.GFATHGO2" ]
[ 1658854283, "downloaded", "IADTQX62", "IADTQX62.CDTQRMUZ" ]
[ 1658854800, "downloaded", "2LYZZEPV", "2LYZZEPV.MREMD5SU" ]
note: The other
event is filtered out because it doesn't contain a GET_ACTION
key
And the final result (with the current logic) would be :
{
"IADTQX62": 1,
"2LYZZEPV": 1
}