I have a table registering different event dates for unique identifiers - one row for each Id-event-date combination. Each unique identifier can have multiple dates, according to the number of events happening to that Id. There are no two events occurring on the same date for individual Ids.
First, I wanted to identify the unique Ids for which the latest event has occurred for more than 7 days to current date. This is my working approach:
WITH latest_event AS
(
SELECT Id,
date,
event,
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY date DESC) AS row_number
FROM `table`
),
SELECT Id,
date,
event,
FROM latest_stage
WHERE row_number = 1
AND DATE_DIFF(CURRENT_DATE(),date, DAY) > 7
ORDER BY date
Now I would like to obtain the same information, but on a monthly basis, instead of just current date. For instance for every month in 2021 and 2022, identify the number of unique Ids which, by the end of each month, did not have an event in the last 7 days. I am struggling to understand how to do this calculation for every month in these two years.
CodePudding user response:
First we generate some data in table tbl
.
We need to check an certain times (1st of month for last month). These combination are created in table all_combinations
. The dates are tuncated to the first of the month and a month is added, because the check of the last month is done on the 1st of the next one.
Both the tbl
and the all_combinations
are combined (union all
) and marked by the test
column.
I commented out two helper lines in the main SELECT
. The check is done on the 1st of the month for the last one and thus we calculate the date_check
by substracting one day.
The window function last_date_window
looks for the last date of the id
(partition by
). The lag
function does not work, because we need to exclude the test dates with an if
statement, the last_value
function is used.
We substract the date from the previous date. If the difference is null, because it is the first entry, we set it to 99 days. If the difference is greater than 7 days the qualify
will keep that record.
with tbl as
(Select cast(rand()*10 as int64) Id, date,"E" event from unnest(generate_date_array("2021-01-01","2022-09-23",interval 1 day)) as date),
all_combinations as
(Select *, "" event, true test
from (Select distinct id from tbl)
cross join (Select distinct date_add(date_trunc(date,month),interval 1 month) as date from tbl))
select *,
#last_value(if(test,null,date) ignore nulls) over (last_date_window ) as last_date,
#ifnull(date_diff(date, last_value(date) over last_date_window,day),99) as diff,
date_sub(date,interval 1 day) as date_check
from (select *, false as test from tbl union all select * from all_combinations)
qualify
ifnull(date_diff(date, last_value(date) over last_date_window,day),99) >7 1 and test
window
last_date_window as (partition by id order by date,test desc rows between unbounded preceding and 1 preceding)
order by date,test desc