Home > Net >  How to take aggregations for a repeating window in BigQuery
How to take aggregations for a repeating window in BigQuery

Time:03-26

I have a dataset where there is an id which has steps. Each step has a timestamp and the channel name. A channel can repeat several times at multiple timestamps for a given id.

I am trying to measure for each chunk of the repeating channels (ordered by the timestamp), how many occurrences happen?

Here is my sample data -

with temp as (
select 1 as id, '2019-08-02 13:13:27 UTC' as t_date, 'email' as channel union all
select 1 as id, '2019-08-02 13:14:27 UTC' as t_date, 'email' as channel union all
select 1 as id, '2019-08-02 13:15:27 UTC' as t_date, 'display' as channel union all
select 1 as id, '2019-08-02 13:16:27 UTC' as t_date, 'display' as channel union all
select 1 as id, '2019-08-02 13:17:27 UTC' as t_date, 'email' as channel union all
select 1 as id, '2019-08-02 13:18:27 UTC' as t_date, 'email' as channel union all
select 2 as id, '2019-08-02 13:11:27 UTC' as t_date, 'email' as channel union all
select 2 as id, '2019-08-02 13:12:27 UTC' as t_date, 'email' as channel union all
select 2 as id, '2019-08-02 13:13:27 UTC' as t_date, 'email' as channel union all
select 2 as id, '2019-08-02 13:14:27 UTC' as t_date, 'email' as channel 
)

select id, channel , count(1) appearances
from temp
group by id , channel
order by id

which gives me output as enter image description here

However, I need something like this - enter image description here

As shown in the output, for each sequences of the channels occurring together, I need to count the appearances and the start and end times. For instance, the first record in the output belongs to email channel that started for id = 1 at 2019-08-02 13:13:27 UTC and ended at 2019-08-02 13:14:27 UTC - ordered by timestamp. The last column shows many how many times email channel repeats before it changes to next one (display in this case).

How can I achieve this in BigQuery?

CodePudding user response:

Consider below approach

select id, channel, 
  min(t_date) as start_date, 
  max(t_date) as end_date, 
  count(1) as appearances
from (
  select *, countif(new_group) over (partition by id order by t_date) group_id 
  from (
    select *, ifnull(channel != lag(channel) over win, true) new_group
    from temp
    window win as (partition by id order by t_date)
  )
)
group by id, channel, group_id               

if applied to sample data in your question - output is

enter image description here

  • Related