I am still on the ropes with BigQuery and I hope you can help me.
I have a data set of failure status from different users in the past 2 months, the scans are done once per day.
Basically, my data set looks like this:
WITH failure_table AS
(SELECT 'Andrea' AS name, 'Failure' AS status, '2022-04-28 4:00:00' AS timestamp
UNION ALL SELECT 'Karl', 'Failure', '2022-04-28 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-04-27 4:00:00'
UNION ALL SELECT 'Karl', 'Failure', '2022-04-27 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-04-26 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-04-25 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-03-30 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-03-29 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-03-28 4:00:00'
UNION ALL SELECT 'Karl', 'Failure', '2022-03-28 4:00:00')
What I wanted to output is the timestamp in which a user first committed a failure, and consecutively commits a failure status every day, leading up to today (2022-04-29).
So in this case, both Andrea and Karl's failures from March will not be considered for analysis, since after March 30, they had success marks and got a failure again only in late April. (I'm not sure if I'm making sense, do let me know).
So my expected output is,
name | status | started failing timestamp | days failing |
---|---|---|---|
Andrea | Failure | 2022-04-25 4:00:00 | 4 |
Karl | Failure | 2022-04-27 4:00:00 | 2 |
I hope someone can help. Thank you!
CodePudding user response:
Use below approach
select name, status,
min(timestamp) as started_failing_timestamp,
date_diff(max(date(timestamp)), min(date(timestamp)), day) 1 as days_failing
from (
select * except(flag), flag, countif(flag > 1) over(partition by name order by timestamp) grp
from (
select *, date_diff(date(timestamp), lag(date(timestamp)) over(partition by name order by timestamp), day) flag
from failure_table
where status = 'Failure'
)
)
group by name, status, grp
qualify 1 = row_number() over(partition by name order by grp desc)
if applied to sample data in your question - output is