Consider this DDL:
CREATE TABLE cash_depot_state
(
id INTEGER NOT NULL PRIMARY KEY,
date DATE,
amount REAL,
cash_depot_id INTEGER
);
INSERT INTO cash_depot_state (date, amount, cash_depot_id)
VALUES (DATE('2022-03-02'), 382489, 5);
INSERT INTO cash_depot_state (date, amount, cash_depot_id)
VALUES (DATE('2022-03-03'), 750, 2);
INSERT INTO cash_depot_state (date, amount, cash_depot_id)
VALUES (DATE('2022-03-04'), 750, 3);
INSERT INTO cash_depot_state (date, amount, cash_depot_id)
VALUES (DATE('2022-03-05'), 0, 5);
For an array of dates I need to select sum of all cash depots' actual amounts:
- 2022-03-01 - no data available - expect 0
- 2022-03-02 - cash depot #5 has changed it's value to 382489 - expect 382489
- 2022-03-03 - cash depot #2 has changed it's value to 750 - expect 382489 750
- 2022-03-03 - cash depot #3 has changed it's value to 750 - expect 382489 750 750
- 2022-03-04 - cash depot #5 has changed it's value to 0 - expect 0 750 750
My best attempt: http://sqlfiddle.com/#!5/94ad0d/1
But I can't figure out how to pick winner of a subgroup
CodePudding user response:
You could define the latest amount per cash depot as the record that has row number 1, when you divvy up records by cash_depot_id
, and order them descending by date
:
SELECT
id,
cash_depot_id,
date,
amount,
ROW_NUMBER() OVER (PARTITION BY cash_depot_id ORDER BY date DESC) rn
FROM
cash_depot_state
This will highlight the latest data from your table - all the relevant rows will have rn = 1
:
id | cash_depot_id | date | amount | rn |
---|---|---|---|---|
2 | 2 | 2022-03-03 | 750.0 | 1 |
3 | 3 | 2022-03-04 | 750.0 | 1 |
4 | 5 | 2022-03-05 | 0.0 | 1 |
1 | 5 | 2022-03-02 | 382489.0 | 2 |
Now you can use a WHERE
clause to filter records to a certain date, e.g. WHERE data <= '2022-03-05'
:
SELECT
SUM(amount) sum_amount
FROM
(
SELECT amount, ROW_NUMBER() OVER (PARTITION BY cash_depot_id ORDER BY date DESC) rn
FROM cash_depot_state
WHERE date <= '2022-03-05'
) latest
WHERE
rn = 1;
will return 1500
.
A more traditional way to solve this would be a correlated sub-query:
SELECT
SUM(amount) sum_amount
FROM
cash_depot_state s
WHERE
date = (
SELECT MAX(date)
FROM cash_depot_state
WHERE date <= '2022-03-05' AND cash_depot_id = s.cash_depot_id
)
or a join against a materialized sub-query:
SELECT
SUM(amount) sum_amount
FROM
cash_depot_state s
INNER JOIN (
SELECT MAX(date) date, cash_depot_id
FROM cash_depot_state
WHERE date <= '2022-03-05'
GROUP BY cash_depot_id
) latest ON latest.cash_depot_id = s.cash_depot_id AND latest.date = s.date
In large tables, these are potentially faster than the ROW_NUMBER()
variant. YMMV, take measurements.
An index that covers date
, cash_depot_id
, and amount
helps all shown approaches:
CREATE INDEX ix_latest_cash ON cash_depot_state (date DESC, cash_depot_id ASC, amount);