Home > Software engineering >  Output the last instance of failure status from historical data in Bigquery
Output the last instance of failure status from historical data in Bigquery

Time:04-29

I am still on the ropes with BigQuery and I hope you can help me.

I have a data set of failure status from different users in the past 2 months, the scans are done once per day.

Basically, my data set looks like this:

WITH failure_table AS
  (SELECT 'Andrea' AS name, 'Failure' AS status, '2022-04-28 4:00:00' AS timestamp
   UNION ALL SELECT 'Karl', 'Failure', '2022-04-28 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-04-27 4:00:00'
   UNION ALL SELECT 'Karl', 'Failure', '2022-04-27 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-04-26 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-04-25 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-30 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-29 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-28 4:00:00'
   UNION ALL SELECT 'Karl', 'Failure', '2022-03-28 4:00:00')

What I wanted to output is the timestamp in which a user first committed a failure, and consecutively commits a failure status every day, leading up to today (2022-04-29).

So in this case, both Andrea and Karl's failures from March will not be considered for analysis, since after March 30, they had success marks and got a failure again only in late April. (I'm not sure if I'm making sense, do let me know).

So my expected output is,

name status started failing timestamp days failing
Andrea Failure 2022-04-25 4:00:00 4
Karl Failure 2022-04-27 4:00:00 2

I hope someone can help. Thank you!

CodePudding user response:

Use below approach

select name, status,  
  min(timestamp) as started_failing_timestamp,
  date_diff(max(date(timestamp)), min(date(timestamp)), day)   1 as days_failing
from (
  select * except(flag), flag, countif(flag > 1) over(partition by name order by timestamp) grp
  from (
    select *, date_diff(date(timestamp), lag(date(timestamp)) over(partition by name order by timestamp), day) flag
    from failure_table
    where status = 'Failure'
  )
)
group by name, status, grp
qualify 1 = row_number() over(partition by name order by grp desc)            

if applied to sample data in your question - output is

enter image description here

  • Related