Home > Enterprise >  PostgreSQL : Cumulative sum
PostgreSQL : Cumulative sum

Time:04-06

I'm having a query (see below) that provide me the following result:

Table input

period age_group tot_dose
2021-01-18 00:00:00 01 0-4 yo 1
2021-03-08 00:00:00 01 05-11 yo 1
2022-01-03 00:00:00 01 05-11 yo 39
2022-01-10 00:00:00 01 05-11 yo 109
2022-01-17 00:00:00 01 05-11 yo 556
2022-01-24 00:00:00 01 05-11 yo 385
2022-01-31 00:00:00 01 05-11 yo 173
2022-02-07 00:00:00 01 05-11 yo 70
2022-02-14 00:00:00 01 05-11 yo 26
2022-02-21 00:00:00 01 05-11 yo 30
2022-02-28 00:00:00 01 05-11 yo 24
2022-03-07 00:00:00 01 05-11 yo 20
2022-03-14 00:00:00 01 05-11 yo 10
2022-03-21 00:00:00 01 05-11 yo 16
2021-06-28 00:00:00 02 12-14 yo 1
2021-07-05 00:00:00 02 12-14 yo 54
2021-07-12 00:00:00 02 12-14 yo 134
2021-07-19 00:00:00 02 12-14 yo 263
2021-07-26 00:00:00 02 12-14 yo 185
2021-08-02 00:00:00 02 12-14 yo 138
2021-08-09 00:00:00 02 12-14 yo 193
2021-08-16 00:00:00 02 12-14 yo 172
2021-08-23 00:00:00 02 12-14 yo 165
2021-08-30 00:00:00 02 12-14 yo 340
2021-09-06 00:00:00 02 12-14 yo 283
.. .. ..

I would like to calculate the cumulative sum based on the age_group criteria That is, providing the following result:

Table output

period age_group tot_dose
2021-01-18 00:00:00 01 0-4 yo 1
2021-03-08 00:00:00 01 05-11 yo 1
2022-01-03 00:00:00 01 05-11 yo 40
2022-01-10 00:00:00 01 05-11 yo 149
2022-01-17 00:00:00 01 05-11 yo 705
2022-01-24 00:00:00 01 05-11 yo 1090
2022-01-31 00:00:00 01 05-11 yo 1263
2022-02-07 00:00:00 01 05-11 yo 1333
2022-02-14 00:00:00 01 05-11 yo 1359
2022-02-21 00:00:00 01 05-11 yo 1389
2022-02-28 00:00:00 01 05-11 yo 1413
2022-03-07 00:00:00 01 05-11 yo 1433
2022-03-14 00:00:00 01 05-11 yo 1443
2022-03-21 00:00:00 01 05-11 yo 1459
2021-06-28 00:00:00 02 12-14 yo 1
2021-07-05 00:00:00 02 12-14 yo 55
2021-07-12 00:00:00 02 12-14 yo 189
.. .. ..

So being given the query providing the Table input above:

with data as (
  select date_trunc('week', date) AS period,
CASE
        WHEN date_part('year',age(birth_date)) BETWEEN 0 AND 4 THEN '0-4 yo'
        WHEN date_part('year',age(birth_date)) BETWEEN 5 AND 11 THEN '05-11 yo'
        WHEN date_part('year',age(birth_date)) BETWEEN 12 AND 14 THEN '12-14 yo'
        WHEN date_part('year',age(birth_date)) BETWEEN 15 AND 19 THEN '15-19 yo'
        WHEN date_part('year',age(birth_date)) BETWEEN 20 AND 24 THEN '20-24 yo'
        WHEN date_part('year',age(birth_date)) BETWEEN 25 AND 34 THEN '25-34 yo'
        WHEN date_part('year',age(birth_date)) BETWEEN 35 AND 44 THEN '35-44 yo'
        WHEN date_part('year',age(birth_date)) BETWEEN 45 AND 54 THEN '45-54 yo'
        WHEN date_part('year',age(birth_date)) BETWEEN 55 AND 64 THEN '55-64 yo'
        WHEN date_part('year',age(birth_date)) BETWEEN 65 AND 74 THEN '65-74 yo'
        ELSE '75  yo' END AS age_group,
    count(dose_number) as tot_dose
  from foo
  where..
  group by period, age_group
)

How should the select query be written to get the Table output above ?

I tried :

select
period, age_group,
  sum(tot_dose) over (order by age_group asc rows between unbounded preceding and current row)
FROM data

But it will provide the cumulative sum independently of the age_group criteria.

Questions

  1. The parameters rows between unbounded preceding and current row should be modified to get the expected value, but not familiar with the way to set them.

  2. (Optional) The with "query" will group by week. However, sometime, some week will be missing as for this given week there will be no data which is normal being given the query. How could the query be rewritten to get those missing weeks with the current age group and 0 for tot_dose ?

CodePudding user response:

In case this might help someone, got it right with:

select
period, age_group,
  sum(tot_dose) over (PARTITION BY age_group ORDER BY period)
FROM data 

CodePudding user response:

The should query

select period,
       age_group,
       tot_dose,
       sum(tot_dose) over(partition by age_group order by age_group  rows BETWEEN
           UNBOUNDED PRECEDING AND
           current row)
from window_function

will produce the following results as shown in table two above

  • Related