I'm having a query (see below) that provide me the following result:
Table input
period | age_group | tot_dose |
---|---|---|
2021-01-18 00:00:00 01 | 0-4 yo | 1 |
2021-03-08 00:00:00 01 | 05-11 yo | 1 |
2022-01-03 00:00:00 01 | 05-11 yo | 39 |
2022-01-10 00:00:00 01 | 05-11 yo | 109 |
2022-01-17 00:00:00 01 | 05-11 yo | 556 |
2022-01-24 00:00:00 01 | 05-11 yo | 385 |
2022-01-31 00:00:00 01 | 05-11 yo | 173 |
2022-02-07 00:00:00 01 | 05-11 yo | 70 |
2022-02-14 00:00:00 01 | 05-11 yo | 26 |
2022-02-21 00:00:00 01 | 05-11 yo | 30 |
2022-02-28 00:00:00 01 | 05-11 yo | 24 |
2022-03-07 00:00:00 01 | 05-11 yo | 20 |
2022-03-14 00:00:00 01 | 05-11 yo | 10 |
2022-03-21 00:00:00 01 | 05-11 yo | 16 |
2021-06-28 00:00:00 02 | 12-14 yo | 1 |
2021-07-05 00:00:00 02 | 12-14 yo | 54 |
2021-07-12 00:00:00 02 | 12-14 yo | 134 |
2021-07-19 00:00:00 02 | 12-14 yo | 263 |
2021-07-26 00:00:00 02 | 12-14 yo | 185 |
2021-08-02 00:00:00 02 | 12-14 yo | 138 |
2021-08-09 00:00:00 02 | 12-14 yo | 193 |
2021-08-16 00:00:00 02 | 12-14 yo | 172 |
2021-08-23 00:00:00 02 | 12-14 yo | 165 |
2021-08-30 00:00:00 02 | 12-14 yo | 340 |
2021-09-06 00:00:00 02 | 12-14 yo | 283 |
.. | .. | .. |
I would like to calculate the cumulative sum based on the age_group criteria That is, providing the following result:
Table output
period | age_group | tot_dose |
---|---|---|
2021-01-18 00:00:00 01 | 0-4 yo | 1 |
2021-03-08 00:00:00 01 | 05-11 yo | 1 |
2022-01-03 00:00:00 01 | 05-11 yo | 40 |
2022-01-10 00:00:00 01 | 05-11 yo | 149 |
2022-01-17 00:00:00 01 | 05-11 yo | 705 |
2022-01-24 00:00:00 01 | 05-11 yo | 1090 |
2022-01-31 00:00:00 01 | 05-11 yo | 1263 |
2022-02-07 00:00:00 01 | 05-11 yo | 1333 |
2022-02-14 00:00:00 01 | 05-11 yo | 1359 |
2022-02-21 00:00:00 01 | 05-11 yo | 1389 |
2022-02-28 00:00:00 01 | 05-11 yo | 1413 |
2022-03-07 00:00:00 01 | 05-11 yo | 1433 |
2022-03-14 00:00:00 01 | 05-11 yo | 1443 |
2022-03-21 00:00:00 01 | 05-11 yo | 1459 |
2021-06-28 00:00:00 02 | 12-14 yo | 1 |
2021-07-05 00:00:00 02 | 12-14 yo | 55 |
2021-07-12 00:00:00 02 | 12-14 yo | 189 |
.. | .. | .. |
So being given the query providing the Table input above:
with data as (
select date_trunc('week', date) AS period,
CASE
WHEN date_part('year',age(birth_date)) BETWEEN 0 AND 4 THEN '0-4 yo'
WHEN date_part('year',age(birth_date)) BETWEEN 5 AND 11 THEN '05-11 yo'
WHEN date_part('year',age(birth_date)) BETWEEN 12 AND 14 THEN '12-14 yo'
WHEN date_part('year',age(birth_date)) BETWEEN 15 AND 19 THEN '15-19 yo'
WHEN date_part('year',age(birth_date)) BETWEEN 20 AND 24 THEN '20-24 yo'
WHEN date_part('year',age(birth_date)) BETWEEN 25 AND 34 THEN '25-34 yo'
WHEN date_part('year',age(birth_date)) BETWEEN 35 AND 44 THEN '35-44 yo'
WHEN date_part('year',age(birth_date)) BETWEEN 45 AND 54 THEN '45-54 yo'
WHEN date_part('year',age(birth_date)) BETWEEN 55 AND 64 THEN '55-64 yo'
WHEN date_part('year',age(birth_date)) BETWEEN 65 AND 74 THEN '65-74 yo'
ELSE '75 yo' END AS age_group,
count(dose_number) as tot_dose
from foo
where..
group by period, age_group
)
How should the select query be written to get the Table output above ?
I tried :
select
period, age_group,
sum(tot_dose) over (order by age_group asc rows between unbounded preceding and current row)
FROM data
But it will provide the cumulative sum independently of the age_group criteria.
Questions
The parameters
rows between unbounded preceding and current row
should be modified to get the expected value, but not familiar with the way to set them.(Optional) The
with
"query" will group by week. However, sometime, some week will be missing as for this given week there will be no data which is normal being given the query. How could the query be rewritten to get those missing weeks with the current age group and 0 for tot_dose ?
CodePudding user response:
In case this might help someone, got it right with:
select
period, age_group,
sum(tot_dose) over (PARTITION BY age_group ORDER BY period)
FROM data
CodePudding user response:
The should query
select period,
age_group,
tot_dose,
sum(tot_dose) over(partition by age_group order by age_group rows BETWEEN
UNBOUNDED PRECEDING AND
current row)
from window_function
will produce the following results as shown in table two above