Home > Back-end >  How to delete every nth row from result set?
How to delete every nth row from result set?

Time:11-19

I have state data from IoT devices (home assistant) logged multiple times a day (depending on device) and retained for 1 year. I'm trying (for devices that log frequently) to keep 1 month of every state change, but for data older than 1 month delete every other state change.

Excerpt from database:

state_id entity_id last_updated
2342932 sensor.climate_outside_humidity 2022-11-12 04:13:46.598786
2063613 sensor.climate_outside_humidity 2022-10-28 03:02:47.756064
1984952 sensor.climate_outside_temperature 2022-10-20 07:32:51.674016
925115 sensor.climate_outside_humidity 2022-07-25 09:54:01.095297
1897854 sensor.climate_outside_humidity 2022-10-11 17:28:13.448728
2005628 sensor.climate_outside_temperature 2022-10-22 12:37:21.027465
1071454 sensor.climate_outside_humidity 2022-08-04 13:16:02.885636
1663793 sensor.climate_outside_temperature 2022-09-17 14:36:05.900979
1756081 sensor.climate_outside_temperature 2022-09-27 23:17:25.688069
2372362 sensor.climate_outside_temperature 2022-11-14 02:28:16.034873

My query so far :

SELECT state_id, entity_id, last_updated
FROM "states"
WHERE entity_id IN ("sensor.climate_outside_temperature","sensor.climate_outside_humidity")
AND last_updated < date('now', '-30 day')
ORDER BY entity_id,state_id DESC

I'm sorting by entity_id, then state_id (as proxy for last_updated as it should be faster). This gives data older than 30 days I'd like to delete every nth row from:

state_id entity_id last_updated
1897854 sensor.climate_outside_humidity 2022-10-11 17:28:13.448728
1071454 sensor.climate_outside_humidity 2022-08-04 13:16:02.885636
925115 sensor.climate_outside_humidity 2022-07-25 09:54:01.095297
1756081 sensor.climate_outside_temperature 2022-09-27 23:17:25.688069
1663793 sensor.climate_outside_temperature 2022-09-17 14:36:05.900979

I could add modulus of rowid to WHERE, but rowid isn't row number in result but primary index on source table. So that wouldn't work. I need a computed index column on result set, but I don't know how.

CodePudding user response:

Use ROW_NUMBER() window function to get the ranking of the rows returned by your query, filter the results to get only the even rows and delete only the filtered rows:

WITH 
  numbers AS (
    SELECT state_id, ROW_NUMBER() OVER (ORDER BY state_id DESC) AS rn
    FROM states 
    WHERE entity_id IN ('sensor.climate_outside_temperature', 'sensor.climate_outside_humidity') 
      AND last_updated < date('now', '-30 day') 
  ),
  ids AS (SELECT state_id FROM numbers WHERE rn % 2 = 0)
DELETE
FROM states
WHERE state_id IN ids;

See the demo.

  • Related