I have SCD table type 2 that I join with various other tables and I am looking to aggregate sum total from any entity that was active (by active I mean the ones that don't yet have an end_date) during an individual month.
currently, I have a query similar to this (let's say aggregating data for the month of May 2022 and April 2022):
select
count(1) as enitities_agg,
DATE_TRUNC('Month', h.start) as date,
sum(h.price) filter (where c.name='HIGH') as sum_total,
----....----
from
project as p
join class as c on p.class_id = c.id
join stage as s on s.project_id = p.id
join stage_info as si on si.stage_id = s.id
join history as h on h.stage_info_id = si.id
where
h.start <= '2022-06-01' and
h.end_date >= '2022-04-01' and
COALESCE(p.end_date, '2099-01-01') >= '2022-04-01' and
COALESCE(p.start_date, '2099-01-01') <= '2022-06-01' and
COALESCE(stage.end, '2099-01-01') >= '2022-04-01' and
h.price is not null and
h.price != 0
group by DATE_TRUNC('Month', h.start)
It aggregates fine only those whose history starts in May or April, not the ones that overlap those months and are still active.
The problem I have is that some history entities start in April, March, etc., and still haven't ended by May. Because I group with group by DATE_TRUNC('Month', h.start)
i.e. by history start date, I don't get the entities that start earlier than April or May and continue to be active after May, I get aggregates only in those months that they have started in.
I was trying to do it by generating series and group by the generated month, however, I didn't find a way that would group them correctly. Example, of one experimentation that I tried.
from
generate_series('2022-03-01', '2022-07-01', INTERVAL '1 month') as mt
join project as p on COALESCE(p.end_date, '2099-01-01') >= mt and
COALESCE(p.start_date, '2099-01-01') <= mt INTERVAL '1 month'
join class as c on p.class_id = c.id
join stage as stage on stage.project_id = p.id and
COALESCE(stage.end, '2099-01-01') >= mt
join stage_info as si on si.stage_id = stage.id
join history as h on h.stage_info_id = si.id
where
h.start <= mt and
h.end_date >= mt INTERVAL '1 month' and
h.price is not null and
h.price != 0
group by mt
How would it be possible to iterate through the history table aggregating any active entities in a month and group them by the same month and get something like this?
"enitities_agg" | "date" | "sum_total"
832 | "2022-04-01 00:00:00" | 15432234
1020 | "2022-05-01 00:00:00" | 19979458
CodePudding user response:
You want all history entities was happened during 2022-May? If so, the following maybe help.
daterange(h.start,
h.end_date,
'[]') && daterange('2022-05-01', '2022-06-01', '[]');
demo:
CREATE temp TABLE test (
begin_ date,
_end date
);
INSERT INTO test
VALUES ('2022-01-01', '2022-05-01');
INSERT INTO test
VALUES ('2022-01-01', '2022-05-11');
INSERT INTO test
VALUES ('2022-05-01', '2022-07-11');
INSERT INTO test
VALUES ('2022-06-11', '2022-07-11');
SELECT
*,
daterange(begin_, _end, '[]')
FROM
test t
WHERE
daterange(t.begin_, t._end, '[]') && daterange('2022-05-01', '2022-05-31', '[]');
&& range operator reference: https://www.postgresql.org/docs/current/functions-range.html#RANGE-OPERATORS-TABLE
CodePudding user response:
Seems your logic is: if any day of begin_ - _end interval falls into month, count it in. This was the hardest part to guess from the desired results.
So I guess you need this:
with dim as (
select
m::date as month_start
,(date_trunc('month', m) interval '1 month - 1 day')::date as month_end
,to_char(date_trunc('month', m), 'Mon') as month
from generate_series('2022-01-01', '2022-08-01', INTERVAL '1 month') as m
)
SELECT
dim.month
, sum(coalesce(t.price, 0)) as sum_price
FROM dim
left join test as t
on t.begin_ <= dim.month_end
and t._end >= dim.month_start
group by dim.month_start, dim.month
order by dim.month_start, dim.month
;
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=614030d4db5e03876f693a9a8a5ff122