Home > Back-end >  Project data and cumulative sum forward
Project data and cumulative sum forward

Time:10-24

I am trying to push the last value of a cumulative dataset forward to present time.

Initialise test data:

drop table if exists test_table;

create table test_table
as select data_date::date, floor(random() * 10) as data_value
from
generate_series('2021-08-25'::date, '2021-08-31'::date, '1 day') data_date;

The above test data produces something like this:

data_date    data_value   cumulative_value
2021-08-25       1               1
2021-08-26       7               8
2021-08-27       8              16
2021-08-28       7              23
2021-08-29       2              25
2021-08-30       2              27
2021-08-31       7              34

What I wish to do, is push the last data value (2021-08-31 7) forward to present time. For example, say today's date was 2021-09-03, I would want the result to be something like:

data_date    data_value   cumulative_value
2021-08-25       1               1
2021-08-26       7               8
2021-08-27       8              16
2021-08-28       7              23
2021-08-29       2              25
2021-08-30       2              27
2021-08-31       7              34
2021-09-01       7              41
2021-09-02       7              48
2021-09-03       7              55

CodePudding user response:

You need to get the value of the last date in the table. Common table expression is a good way to do that:

with cte as (
    select data_value as last_val
    from test_table 
    order by data_date desc 
    limit 1)

select 
    gen_date::date as data_date, 
    coalesce(data_value, last_val) as data_value, 
    sum(coalesce(data_value, last_val)) over (order by gen_date) as cumulative_sum
from generate_series('2021-08-25'::date, '2021-09-03', '1 day') as gen_date
left join test_table on gen_date = data_date
cross join cte

Test it in db<>fiddle.

CodePudding user response:

You may use union and a scalar subquery to find the latest value of data_value for for the new rows. cumulative_value is re-evaluated.

select *, sum(data_value) over (rows between unbounded preceding and current row) as cumulative_value
from 
(
 select data_date, data_value from test_table
 UNION all 
 select rd, (select data_value from test_table where data_date = '2021-08-31')
 from generate_series('2021-09-01'::date, '2021-09-03', '1 day') rd
) t 
order by data_date;

And here it is a bit smarter w/o fixed date literals.

with cte(latest_date) as (select max(data_date) from test_table) 
select *, sum(data_value) over (rows between unbounded preceding and current row) as cumulative_value
from 
(
 select data_date, data_value from test_table
 UNION ALL 
 select rd::date, (select data_value from test_table, cte where data_date = latest_date)
 from generate_series((select latest_date from cte)   1, CURRENT_DATE, '1 day') rd
) t 
order by data_date;

SQL Fiddle here.

  • Related