Taking it a step further than in my previous question: I have a table below with sample data.
datetimestamp | message | server | system |
---|---|---|---|
2022-07-13 08:59:09 | Normal | Server 1 | System 1 |
2022-07-13 08:59:10 | Normal | Server 4 | System 2 |
2022-07-13 08:59:11 | Normal | Server 3 | System 3 |
2022-07-13 08:59:12 | Warning | Server 35 | System 27 |
2022-07-13 08:59:13 | Warning | Server 5 | System 5 |
2022-07-13 08:59:14 | Warning | Server 9 | System 6 |
2022-07-13 08:59:15 | Warning | Server 8 | System 7 |
2022-07-13 08:59:16 | Error | Server 12 | System 8 |
2022-07-13 08:59:17 | Error | Server 15 | System 9 |
2022-07-13 08:59:18 | Warning | Server 29 | System 10 |
2022-07-13 08:59:19 | Warning | Server 22 | System 11 |
2022-07-13 08:59:20 | Warning | Server 13 | System 12 |
2022-07-13 08:59:21 | Normal | Server 16 | System 13 |
2022-07-13 08:59:22 | Normal | Server 19 | System 14 |
2022-07-13 08:59:23 | Normal | Server 21 | System 15 |
2022-07-13 08:59:24 | Warning | Server 24 | System 16 |
2022-07-13 08:59:25 | Warning | Server 27 | System 17 |
2022-07-13 08:59:26 | Warning | Server 25 | System 18 |
2022-07-13 08:59:27 | Error | Server 30 | System 23 |
2022-07-13 08:59:28 | Error | Server 31 | System 20 |
I need to write a Postgres query which will give me output like:
Start_Datetime | End_Datetime | Server_Start | Server_End | System_Start | System_End |
---|---|---|---|---|---|
2022-07-13 08:59:12 | 2022-07-13 08:59:15 | Server 35 | Server 8 | System 27 | System 7 |
2022-07-13 08:59:24 | 2022-07-13 08:59:26 | Server 24 | Server 25 | System 16 | System 18 |
If there is any 'Error' message, I need to take the start time of 'Warning' and end time of 'Warning' - and corresponding server
and system
! If there is no 'Error' message after 'Warning' ignore it e.g. after the warning at 2022-07-13 08:59:20 there is no error, so the query should ignore that range.
Setup table queries:
CREATE TABLE test_data (
id integer PRIMARY KEY
, message varchar(10)
, datetimestamp timestamp NOT NULL
, server varchar(10)
, system varchar(10)
);
INSERT INTO test_data VALUES
(09, 'Normal' , '2022-07-13 08:59:09', 'Server 1' , 'System 1')
, (10, 'Normal' , '2022-07-13 08:59:10', 'Server 4' , 'System 2')
, (11, 'Normal' , '2022-07-13 08:59:11', 'Server 3' , 'System 3')
, (12, 'Warning', '2022-07-13 08:59:12', 'Server 35', 'System 27')
, (13, 'Warning', '2022-07-13 08:59:13', 'Server 5' , 'System 5')
, (14, 'Warning', '2022-07-13 08:59:14', 'Server 9' , 'System 6')
, (15, 'Warning', '2022-07-13 08:59:15', 'Server 8' , 'System 7')
, (16, 'Error' , '2022-07-13 08:59:16', 'Server 12', 'System 8')
, (17, 'Error' , '2022-07-13 08:59:17', 'Server 15', 'System 9')
, (18, 'Warning', '2022-07-13 08:59:18', 'Server 29', 'System 10')
, (19, 'Warning', '2022-07-13 08:59:19', 'Server 22', 'System 11')
, (20, 'Warning', '2022-07-13 08:59:20', 'Server 13', 'System 12')
, (21, 'Normal' , '2022-07-13 08:59:21', 'Server 16', 'System 13')
, (22, 'Normal' , '2022-07-13 08:59:22', 'Server 19', 'System 14')
, (23, 'Normal' , '2022-07-13 08:59:23', 'Server 21', 'System 15')
, (24, 'Warning', '2022-07-13 08:59:24', 'Server 24', 'System 16')
, (25, 'Warning', '2022-07-13 08:59:25', 'Server 27', 'System 17')
, (26, 'Warning', '2022-07-13 08:59:26', 'Server 25', 'System 18')
, (27, 'Error' , '2022-07-13 08:59:27', 'Server 30', 'System 23')
, (28, 'Error' , '2022-07-13 08:59:28', 'Server 31', 'System 20')
;
CodePudding user response:
This is a kind of gaps and islands problem. First detect groups of messages and then find lead / lag group kind. Take the combinations needed.
with mg as(
select message, min(id) id1, max(id) id2
, lag(message, 1, 'Normal') over(order by min(id)) prevmsg
, lead(message,1, 'Error') over(order by min(id)) nxtmsg
from (
select *, row_number() over(order by datetimestamp) - row_number() over(partition by message order by datetimestamp) grp
from test_data
) t
group by message, grp
)
select mg.message, mg.id1, mg.id2
from mg
where mg.message ='Warning' and mg.prevmsg='Normal' and mg.nxtmsg='Error'
With your test data it will return
message id1 id2
Warning 12 15
Warning 24 26
Join it to the original data twice to select columns you want for id1 and id2.
P.S. Correct you sample data INSERT, single quotes are missing.
CodePudding user response:
You may try the following:
With cte As
(
Select id, message,datetimestamp,
Coalesce(Lead(message) Over (Order By datetimestamp,id) , 'last') as ld,
Coalesce(lag(message) Over (Order By datetimestamp,id) , 'first') as lg
From test_data
),
cte2 As
(
Select id,message,ld,lg,
Coalesce(Lag(id) over (order by datetimestamp,id),id) lastWarningId,
Coalesce(Lag(id, 2) over (order by datetimestamp,id),
Lag(id) over (order by datetimestamp,id)) firstWarningId
From cte where message<> ld or message <> lg
)
Select T.datetimestamp as Start_DateTime, T2.datetimestamp as End_DateTime,
T.server as Server_Start, T2.server as Server_End,
T.system as System_Start, T2.system as System_End
From cte2 C
Join test_data T On T.id = C.firstwarningid
Join test_data T2 On T2.id = C.lastwarningid
Where C.message='Error' And C.lg='Warning'
See a demo from uk<>fiddle.
The idea of this query is to find the first and last id
for each group of messages, which is done in cet2
. Where:
Lag(id) over (order by datetimestamp,id)
is the lastid
of each group.
Lag(id, 2) over (order by datetimestamp,id)
is the first id
of each group.
Then for each group of message='Error'
find the first and last id
of the nearest group of message='Warning'
, which done by Where C.message='Error' And C.lg='Warning'
.
To get more details about how it works, try to select * from cet2 Where message='Error' And lg='Warning'
.
CodePudding user response:
Elaborating on my answer to your previous question, this does the job:
WITH cte AS (
SELECT grp, datetimestamp, message, server, system
, bool_or(message = 'Error' AND last_msg = 'Warning') OVER (PARTITION BY grp) AS report -- qualifies for report
FROM (
SELECT *
, count(*) FILTER (WHERE message = 'Warning' AND last_msg <> 'Warning') OVER (ORDER BY datetimestamp) AS grp
FROM (
SELECT datetimestamp, message, server, system
, lag(message) OVER (ORDER BY datetimestamp) AS last_msg
FROM test_data
) sub1
WHERE message IN ('Warning', 'Error') -- trim noise early
) sub2
)
SELECT grp
, a.datetimestamp AS start_time, z.datetimestamp AS end_time
, a.server AS start_server, z.server AS end_server
, a.system AS start_system, z.system AS end_system
FROM (
SELECT DISTINCT ON (grp) *
FROM cte
WHERE report
ORDER BY grp, datetimestamp
) a
JOIN (
SELECT DISTINCT ON (grp) *
FROM cte
WHERE report
AND message = 'Warning' -- exclude trailing errors
ORDER BY grp, datetimestamp DESC
) z USING (grp);
db<>fiddle here
Produces your desired result exactly.
This relies on unique timestamps. If there can be duplicates, work with id
additionally (or break ties some other way).
Move the base query from the previous answer to a CTE, and add the tag report
to mark qualifying rows - we are going to reference it multiple times.
In the outer query, use two subqueries with DISTINCT ON
to retrieve first and last row per group, then join ...
See: