big query SQL - repeatedly/recursively change a row's column in the select statement based on t-CodePudding

I have table like below

customer	date	end date
1	jan 1 2021	jan 30 2021
1	jan 2 2021	jan 31 2021
1	jan 3 2021	feb 1 2021
1	jan 27 2021	feb 26 2021
1	feb 3 2021	mar 5 2021
2	jan 2 2021	jan 31 2021
2	jan 10 2021	feb 9 2021
2	feb 10 2021	mar 12 2021

Now, I wanted to update the value in the 'end date' column of a row based on the values in the previous row 'end date' and the current row 'date'. Say if the date in current row < end date of the previous row, I wanted to update the end date of the current row = (end date of the previous row). I Wanted to do this repeated for all the rows (grouped by customer).

I want the output as below. Just need it in the select statement instead of a updating/inserting in a table.

Note - in below as the second row(end date) is updated with the value in the first row (jan 30 2021), now the third row value (jan 3 2021) is evaluated against the updated value in the second row (which is jan 30 2021) but not with the second row value before update (jan 31 2021).

customer	date	end date
1	jan 1 2021	jan 30 2021
1	jan 2 2021	jan 30 2021 [updated because current date < previous end date]
1	jan 3 2021	jan 30 2021[updated because current date < previous end date]
1	jan 27 2021	jan 30 2021 [updated because current date < previous end date]
1	feb 3 2021	mar 5 2021
2	jan 2 2021	jan 31 2021
2	jan 10 2021	jan 31 2021[updated because current date < previous end date]
2	feb 10 2021	mar 12 2021

CodePudding user response：

I think I should go this way. I use the datasource twice just to get the way its needed to perform the operation without updating or inserting into the table.

input table:

1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-31 
1|2021-01-03|2021-02-01
1|2021-01-27|2021-02-26
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-02-09
2|2021-02-10|2021-03-12

code:

with num_raw_data as (
    SELECT row_number() over(partition by customer)as num, customer,date_init,date_end  
    FROM `project-id.data-set.table`
), analyzed_data as(
    select r.num,
    r.customer,
    r.date_init,
    r.date_end,
    case when date_init<(select date_end from num_raw_data where num=r.num-1 and customer=r.customer and EXTRACT(month FROM r.date_init)=EXTRACT(month FROM date_init)) then 1 else 0 end validation
    from num_raw_data r 
)
select customer,
date_init,
case when validation !=0 then (select MIN(date_end) from analyzed_data where validation=0 and customer=ad.customer and date_init<ad.date_end) else date_end end as date_end
from analyzed_data ad
order by customer,num

output:

1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-30
1|2021-01-03|2021-01-30 
1|2021-01-27|2021-01-30 
1|2021-02-03|2021-03-05 
2|2021-01-02|2021-01-31
2|2021-01-10|2021-01-31 
2|2021-02-10|2021-03-12

Using column validation from analyzed_data to get to know where I should be looking for changes. I'm not sure if its fast (probably not) but it works for the scenario you bring in your question.