I have table like below
customer | date | end date |
---|---|---|
1 | jan 1 2021 | jan 30 2021 |
1 | jan 2 2021 | jan 31 2021 |
1 | jan 3 2021 | feb 1 2021 |
1 | jan 27 2021 | feb 26 2021 |
1 | feb 3 2021 | mar 5 2021 |
2 | jan 2 2021 | jan 31 2021 |
2 | jan 10 2021 | feb 9 2021 |
2 | feb 10 2021 | mar 12 2021 |
Now, I wanted to update the value in the 'end date' column of a row based on the values in the previous row 'end date' and the current row 'date'. Say if the date in current row < end date of the previous row, I wanted to update the end date of the current row = (end date of the previous row). I Wanted to do this repeated for all the rows (grouped by customer).
I want the output as below. Just need it in the select statement instead of a updating/inserting in a table.
Note - in below as the second row(end date) is updated with the value in the first row (jan 30 2021), now the third row value (jan 3 2021) is evaluated against the updated value in the second row (which is jan 30 2021) but not with the second row value before update (jan 31 2021).
customer | date | end date |
---|---|---|
1 | jan 1 2021 | jan 30 2021 |
1 | jan 2 2021 | jan 30 2021 [updated because current date < previous end date] |
1 | jan 3 2021 | jan 30 2021[updated because current date < previous end date] |
1 | jan 27 2021 | jan 30 2021 [updated because current date < previous end date] |
1 | feb 3 2021 | mar 5 2021 |
2 | jan 2 2021 | jan 31 2021 |
2 | jan 10 2021 | jan 31 2021[updated because current date < previous end date] |
2 | feb 10 2021 | mar 12 2021 |
CodePudding user response:
I think I should go this way. I use the datasource twice just to get the way its needed to perform the operation without updating or inserting into the table.
input table:
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-31
1|2021-01-03|2021-02-01
1|2021-01-27|2021-02-26
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-02-09
2|2021-02-10|2021-03-12
code:
with num_raw_data as (
SELECT row_number() over(partition by customer)as num, customer,date_init,date_end
FROM `project-id.data-set.table`
), analyzed_data as(
select r.num,
r.customer,
r.date_init,
r.date_end,
case when date_init<(select date_end from num_raw_data where num=r.num-1 and customer=r.customer and EXTRACT(month FROM r.date_init)=EXTRACT(month FROM date_init)) then 1 else 0 end validation
from num_raw_data r
)
select customer,
date_init,
case when validation !=0 then (select MIN(date_end) from analyzed_data where validation=0 and customer=ad.customer and date_init<ad.date_end) else date_end end as date_end
from analyzed_data ad
order by customer,num
output:
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-30
1|2021-01-03|2021-01-30
1|2021-01-27|2021-01-30
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-01-31
2|2021-02-10|2021-03-12
Using column validation
from analyzed_data
to get to know where I should be looking for changes. I'm not sure if its fast (probably not) but it works for the scenario you bring in your question.