Home > OS >  How to differentiate iteration using date filed in bigquery
How to differentiate iteration using date filed in bigquery

Time:01-14

I have a process that occur every 30 days but can take few days. How can I differentiate between each iteration in order to sum the output of the process?

for Example

1 the output I except is

Name Date amount iteration (optional)
Sophia Liu 2016-01-01 4 1
Sophia Liu 2016-02-01 5 2
Nikki Leith 2016-01-02 5 1
Nikki Leith 2016-02-01 10 2

I tried using lag function on the date filed and using the difference between that column and the date column.

WITH base AS
 (SELECT 'Sophia Liu' as name, DATE '2016-01-01' as date, 3 as amount
  UNION ALL SELECT 'Sophia Liu', DATE '2016-01-02', 1
  UNION ALL SELECT 'Sophia Liu', DATE '2016-02-01', 3
  UNION ALL SELECT 'Sophia Liu', DATE '2016-02-02', 2
  UNION ALL SELECT 'Nikki Leith', DATE '2016-01-02', 5
  UNION ALL SELECT 'Nikki Leith', DATE '2016-02-01', 5
  UNION ALL SELECT 'Nikki Leith', DATE '2016-02-02', 3
  UNION ALL SELECT 'Nikki Leith', DATE '2016-02-03', 1
  UNION ALL SELECT 'Nikki Leith', DATE '2016-02-04', 1)


select 
name
,date
,lag(date) over (partition by name order by date) as lag_func
,date_diff(date,lag(date) over (partition by name order by date),day) date_differacne
,case when date_diff(date,lag(date) over (partition by name order by date),day) >= 10 
or date_diff(date,lag(date) over (partition by name order by date),day) is null then true else false end as new_iteration
,amount
from base

CodePudding user response:

Edited answer

After your clarification and looking at what's actually in your SQL code. I'm guessing you are looking for a solution to what's called a gaps and islands problem. That is, you want to identify the "islands" of activity and sum the amount for each iteration or island. Taking your example you can first identify the start of a new session (or "gap") and then use that to create a unique iteration ("island") identifier for each user. You can then use that identifier to perform a SUM().

gaps as (
  select
    name,
    date,
    amount,
    if(date_diff(date, lag(date,1) over(partition by name order by date), DAY) >= 10, 1, 0) new_iteration
    from base
),

islands as (
  select
 *,
 1   sum(new_iteration) over(partition by name order by date) iteration_id
  from gaps
)

select
  *,
  sum(amount) over(partition by name, iteration_id) iteration_amount
from islands

Previous answer

Sounds like you just need a RANK() to count the iterations in your window functions. Depending on your need you can then sum cumulative or total amounts in a similar window function. Something like this:

select 
  name
 ,date
 ,rank() over (partition by name order by date) as iteration
 ,sum(amount) over (partition by name order by date) as cumulative_amount
 ,sum(amount) over (partition by name) as total_amount
 ,amount
from base
  • Related