I have state data from IoT devices (home assistant) logged multiple times a day (depending on device) and retained for 1 year. I'm trying (for devices that log frequently) to keep 1 month of every state change, but for data older than 1 month delete every other state change.
Excerpt from database:
state_id | entity_id | last_updated |
---|---|---|
2342932 | sensor.climate_outside_humidity | 2022-11-12 04:13:46.598786 |
2063613 | sensor.climate_outside_humidity | 2022-10-28 03:02:47.756064 |
1984952 | sensor.climate_outside_temperature | 2022-10-20 07:32:51.674016 |
925115 | sensor.climate_outside_humidity | 2022-07-25 09:54:01.095297 |
1897854 | sensor.climate_outside_humidity | 2022-10-11 17:28:13.448728 |
2005628 | sensor.climate_outside_temperature | 2022-10-22 12:37:21.027465 |
1071454 | sensor.climate_outside_humidity | 2022-08-04 13:16:02.885636 |
1663793 | sensor.climate_outside_temperature | 2022-09-17 14:36:05.900979 |
1756081 | sensor.climate_outside_temperature | 2022-09-27 23:17:25.688069 |
2372362 | sensor.climate_outside_temperature | 2022-11-14 02:28:16.034873 |
My query so far :
SELECT state_id, entity_id, last_updated
FROM "states"
WHERE entity_id IN ("sensor.climate_outside_temperature","sensor.climate_outside_humidity")
AND last_updated < date('now', '-30 day')
ORDER BY entity_id,state_id DESC
I'm sorting by entity_id
, then state_id
(as proxy for last_updated
as it should be faster). This gives data older than 30 days I'd like to delete every nth row from:
state_id | entity_id | last_updated |
---|---|---|
1897854 | sensor.climate_outside_humidity | 2022-10-11 17:28:13.448728 |
1071454 | sensor.climate_outside_humidity | 2022-08-04 13:16:02.885636 |
925115 | sensor.climate_outside_humidity | 2022-07-25 09:54:01.095297 |
1756081 | sensor.climate_outside_temperature | 2022-09-27 23:17:25.688069 |
1663793 | sensor.climate_outside_temperature | 2022-09-17 14:36:05.900979 |
I could add modulus of rowid
to WHERE
, but rowid
isn't row number in result but primary index on source table. So that wouldn't work. I need a computed index column on result set, but I don't know how.
CodePudding user response:
Use ROW_NUMBER()
window function to get the ranking of the rows returned by your query, filter the results to get only the even rows and delete only the filtered rows:
WITH
numbers AS (
SELECT state_id, ROW_NUMBER() OVER (ORDER BY state_id DESC) AS rn
FROM states
WHERE entity_id IN ('sensor.climate_outside_temperature', 'sensor.climate_outside_humidity')
AND last_updated < date('now', '-30 day')
),
ids AS (SELECT state_id FROM numbers WHERE rn % 2 = 0)
DELETE
FROM states
WHERE state_id IN ids;
See the demo.