Home > Software engineering >  SQLite Query to get last record for each id before inputted datetime
SQLite Query to get last record for each id before inputted datetime

Time:08-11

I have a SQLite table of security prices that have timestamps from when each price was recorded. I'm trying to write a query were I can pass an arbitrary price_datetime input, say 2022-08-10 19:000:00.000000, and pull the latest price for each security_id that is <= the inputed price_datetime.

Below is a sample table:

┌─────────────┬───────┬────────────────────────────┐
│ security_id │ price │       price_datetime       │
├─────────────┼───────┼────────────────────────────┤
│ 4           │ 90.0  │ 2022-08-11 20:00:00.000000 │
│ 1           │ 100.0 │ 2022-08-10 20:00:00.000000 │
│ 1           │ 90.0  │ 2022-08-10 19:00:00.000000 │
│ 2           │ 95.0  │ 2022-08-10 18:00:00.000000 │
│ 2           │ 90.0  │ 2022-08-09 20:00:00.000000 │
│ 1           │ 90.0  │ 2022-08-09 20:00:00.000000 │
│ 3           │ 90.0  │ 2022-08-08 20:00:00.000000 │
└─────────────┴───────┴────────────────────────────┘

The input 2022-08-10 19:000:00.000000 would return below:

┌─────────────┬───────┬────────────────────────────┐
│ security_id │ price │       price_datetime       │
├─────────────┼───────┼────────────────────────────┤
│ 4           │ null  │ null                       │
│ 1           │ 90.0  │ 2022-08-10 19:00:00.000000 │
│ 2           │ 95.0  │ 2022-08-10 18:00:00.000000 │
│ 3           │ 90.0  │ 2022-08-08 20:00:00.000000 │
└─────────────┴───────┴────────────────────────────┘

security_id 4 returns null since there is no price record with a price_datetime <= 2022-08-10 19:000:00.000000. The query should also have the flexibility to return the last 1, 2, etc. records <= the inputted price_datetime in addition to just the last.

Code to generate the sample table:

CREATE TABLE prices (
    security_id INTEGER,
    price FLOAT NOT NULL, 
    price_datetime DATETIME NOT NULL

);

CREATE INDEX ix_prices_price_datetime on "prices" (price_datetime);

INSERT INTO prices VALUES(1,100,'2022-08-10 20:00:00.000000'),(1,90,'2022-08-10 19:00:00.000000'),(1,90,'2022-08-09 20:00:00.000000'),(2,90,'2022-08-09 20:00:00.000000'),(2,95,'2022-08-10 18:00:00.000000'),(3,90,'2022-08-08 20:00:00.000000'),(4,90,'2022-08-11 20:00:00.000000');

SELECT * FROM prices ORDER BY price_datetime DESC;

CodePudding user response:

WITH
    query_input(price_datetime, row_count) AS (
        VALUES (
            '2022-08-10 19:000:00.000000', -- @price_datetime,
            2                              -- @row_count
        )
    ),
    -- Remove this CTE when running against your db containing the actual "prices" table.
    prices(security_id, price, price_datetime) AS (
        VALUES 
            (1,  99,  '2022-08-10 20:00:00.000000'),  
            (1,  90,  '2022-08-10 19:00:00.000000'),  
            (1,  90,  '2022-08-09 20:00:00.000000'),  
            (2,  90,  '2022-08-09 20:00:00.000000'),  
            (2,  95,  '2022-08-10 18:00:00.000000'),  
            (3,  90,  '2022-08-08 20:00:00.000000'),  
            (4,  90,  '2022-08-11 20:00:00.000000')
    ),
    -- As you requested to show every security, you need to collect all id's.
    -- I assume you define table "securities" containing column "id", so that
    -- prices.security_id => securities.id. Remove the following query when
    -- running against your db and adjust the "ids" CTE below if necessary.
    securities(id) AS (VALUES (1), (2), (3), (4)),
    ids(id) AS (
        -- Adjust "id" and "securities" to match your schema
        SELECT id FROM securities
    ),
    filtered_prices AS (
        SELECT prices.*
        FROM prices, query_input
        WHERE prices.price_datetime <= query_input.price_datetime
    ),
    label_prices AS (
        SELECT
            fp.*,
            row_number() OVER (PARTITION BY fp.security_id ORDER BY fp.price_datetime DESC) AS group_id
        FROM filtered_prices AS fp
    ),
    last_n_existing_prices AS (
        SELECT lp.*
        FROM label_prices AS lp, query_input AS qi
        WHERE lp.group_id <= qi.row_count
    ),
    last_n_prices AS (
        SELECT ids.id AS security_id, lp.price, lp.price_datetime
        FROM ids
        LEFT JOIN last_n_existing_prices AS lp
        ON ids.id = lp.security_id
        ORDER BY ids.id, lp.price_datetime DESC
    )
SELECT * FROM last_n_prices;

CodePudding user response:

You need a LEFT join of the distinct security_ids to the table.

For the last price of each security_id the simplest way to get it is with aggregation, using SQLite's feature of bare columns:

SELECT s.security_id, p.price, MAX(p.price_datetime) price_datetime
FROM (SELECT DISTINCT security_id FROM prices) s
LEFT JOIN prices p ON p.security_id = s.security_id AND p.price_datetime <= '2022-08-10 19:000:00.000000'
GROUP BY s.security_id;

For the last n rows use ROW_NUMBER() window function:

WITH cte AS (
  SELECT s.security_id, p.price, p.price_datetime,
         ROW_NUMBER() OVER (PARTITION BY s.security_id ORDER BY p.price_datetime DESC) rn
  FROM (SELECT DISTINCT security_id FROM prices) s
  LEFT JOIN prices p ON p.security_id = s.security_id AND p.price_datetime <= '2022-08-10 19:000:00.000000'
)  
SELECT security_id, price, price_datetime
FROM cte
WHERE rn <= n; -- replace n with the number that you want

See the demo.

  • Related