Select latest available SQL entry state-CodePudding

Consider this DDL:

CREATE TABLE cash_depot_state 
(
    id INTEGER NOT NULL PRIMARY KEY,
    date DATE,
    amount REAL,
    cash_depot_id INTEGER
);
    
INSERT INTO cash_depot_state (date, amount, cash_depot_id) 
VALUES (DATE('2022-03-02'), 382489, 5);
INSERT INTO cash_depot_state (date, amount, cash_depot_id) 
VALUES (DATE('2022-03-03'), 750, 2);
INSERT INTO cash_depot_state (date, amount, cash_depot_id) 
VALUES (DATE('2022-03-04'), 750, 3);
INSERT INTO cash_depot_state (date, amount, cash_depot_id) 
VALUES (DATE('2022-03-05'), 0, 5);

For an array of dates I need to select sum of all cash depots' actual amounts:

2022-03-01 - no data available - expect 0
2022-03-02 - cash depot #5 has changed it's value to 382489 - expect 382489
2022-03-03 - cash depot #2 has changed it's value to 750 - expect 382489 750
2022-03-03 - cash depot #3 has changed it's value to 750 - expect 382489 750 750
2022-03-04 - cash depot #5 has changed it's value to 0 - expect 0 750 750

My best attempt: http://sqlfiddle.com/#!5/94ad0d/1

But I can't figure out how to pick winner of a subgroup

CodePudding user response：

You could define the latest amount per cash depot as the record that has row number 1, when you divvy up records by cash_depot_id, and order them descending by date:

SELECT 
    id,
    cash_depot_id,
    date,
    amount,
    ROW_NUMBER() OVER (PARTITION BY cash_depot_id ORDER BY date DESC) rn
FROM
    cash_depot_state

This will highlight the latest data from your table - all the relevant rows will have rn = 1:

id	cash_depot_id	date	amount	rn
2	2	2022-03-03	750.0	1
3	3	2022-03-04	750.0	1
4	5	2022-03-05	0.0	1
1	5	2022-03-02	382489.0	2

Now you can use a WHERE clause to filter records to a certain date, e.g. WHERE data <= '2022-03-05':

SELECT
    SUM(amount) sum_amount
FROM
    (
        SELECT amount, ROW_NUMBER() OVER (PARTITION BY cash_depot_id ORDER BY date DESC) rn
        FROM   cash_depot_state 
        WHERE  date <= '2022-03-05'
    ) latest
WHERE
    rn = 1;

will return 1500.

A more traditional way to solve this would be a correlated sub-query:

SELECT
    SUM(amount) sum_amount
FROM
    cash_depot_state s
WHERE
    date = (
        SELECT MAX(date)
        FROM   cash_depot_state
        WHERE  date <= '2022-03-05' AND cash_depot_id = s.cash_depot_id
    )

or a join against a materialized sub-query:

SELECT
    SUM(amount) sum_amount
FROM
    cash_depot_state s
    INNER JOIN (
        SELECT   MAX(date) date, cash_depot_id
        FROM     cash_depot_state
        WHERE    date <= '2022-03-05' 
        GROUP BY cash_depot_id
    ) latest ON latest.cash_depot_id = s.cash_depot_id AND latest.date = s.date

In large tables, these are potentially faster than the ROW_NUMBER() variant. YMMV, take measurements.

An index that covers date, cash_depot_id, and amount helps all shown approaches:

CREATE INDEX ix_latest_cash ON cash_depot_state (date DESC, cash_depot_id ASC, amount);