Home > Enterprise >  How to get list of latest event per Id and month based on date interval?
How to get list of latest event per Id and month based on date interval?

Time:10-05

I have a table registering different event dates for unique identifiers - one row for each Id-event-date combination. Each unique identifier can have multiple dates, according to the number of events happening to that Id. There are no two events occurring on the same date for individual Ids.

First, I wanted to identify the unique Ids for which the latest event has occurred for more than 7 days to current date. This is my working approach:

WITH latest_event AS
(
SELECT Id,
       date,
       event,
       ROW_NUMBER() OVER(PARTITION BY Id ORDER BY date DESC) AS row_number
  FROM `table`
),

SELECT Id,
       date,
       event,
  FROM latest_stage 
 WHERE row_number = 1
   AND DATE_DIFF(CURRENT_DATE(),date, DAY) > 7
 ORDER BY date

Now I would like to obtain the same information, but on a monthly basis, instead of just current date. For instance for every month in 2021 and 2022, identify the number of unique Ids which, by the end of each month, did not have an event in the last 7 days. I am struggling to understand how to do this calculation for every month in these two years.

CodePudding user response:

First we generate some data in table tbl.

We need to check an certain times (1st of month for last month). These combination are created in table all_combinations. The dates are tuncated to the first of the month and a month is added, because the check of the last month is done on the 1st of the next one.

Both the tbl and the all_combinations are combined (union all) and marked by the test column.

I commented out two helper lines in the main SELECT. The check is done on the 1st of the month for the last one and thus we calculate the date_check by substracting one day.

The window function last_date_window looks for the last date of the id (partition by). The lag function does not work, because we need to exclude the test dates with an if statement, the last_value function is used.

We substract the date from the previous date. If the difference is null, because it is the first entry, we set it to 99 days. If the difference is greater than 7 days the qualify will keep that record.

with tbl as 
(Select cast(rand()*10 as int64) Id,  date,"E" event from unnest(generate_date_array("2021-01-01","2022-09-23",interval 1 day)) as date),
all_combinations as 
(Select *, "" event, true test 
  from (Select distinct id from tbl) 
  cross join (Select distinct date_add(date_trunc(date,month),interval 1 month) as date from tbl))
select *,
#last_value(if(test,null,date) ignore nulls) over (last_date_window  ) as last_date,
#ifnull(date_diff(date, last_value(date) over last_date_window,day),99) as diff,
date_sub(date,interval 1 day) as date_check
 from (select *, false as test from tbl union all select * from all_combinations)

qualify 
  ifnull(date_diff(date, last_value(date) over last_date_window,day),99)  >7 1 and test
 window
  last_date_window as (partition by id order by date,test desc rows between unbounded preceding and 1 preceding)

 
order  by date,test desc
  • Related