Home > Back-end >  Postgresql: DELETE from multiple tables based on the same SELECT query
Postgresql: DELETE from multiple tables based on the same SELECT query

Time:11-08

My table structure is as described in this post:

 name | version | processed | processing | updated  | ref_time 
------ --------- ----------- ------------ ---------- ----------
 abc  |       1 | t         | f          | 27794395 | 27794160
 def  |       1 | t         | f          | 27794395 | 27793440
 ghi  |       1 | t         | f          | 27794395 | 27793440
 jkl  |       1 | f         | f          | 27794395 | 27794160
 mno  |       1 | t         | f          | 27794395 | 27793440
 pqr  |       1 | f         | t          | 27794395 | 27794160

Based on this answer, I am deriving a list of ref_time values which I want to use as a basis for deleting 'old' entries from status_table.

This is the query to generate the list of relevant ref_time values:

WITH main AS
(
    SELECT ref_time,
        ROUND(AVG(processed::int) * 100, 1) percent
    FROM status_table
    GROUP BY ref_time ORDER BY ref_time DESC, percent DESC
)
SELECT ref_time FROM main WHERE percent=100 OFFSET 2;

For example this might return:

 ref_time 
----------
 27794880
 27794160

I can then use this to DELETE all relevant entries in the status_table:

DELETE FROM status_table
WHERE ref_time IN 
(
    WITH main AS
    (
        SELECT ref_time,
            ROUND(AVG(processed::int) * 100, 1) percent
        FROM status_table
        GROUP BY ref_time ORDER BY ref_time DESC, percent DESC
    )
    SELECT ref_time FROM main WHERE percent=100 OFFSET 2
);

But I have another table named data_table, which also has a ref_time column, and I want to DELETE entries from that table on the same basis, i.e. any rows having ref_time in the above list.

How do I achieve this without duplicating the query used to generate the ref_time list?

CodePudding user response:

You can use common table expressions:

with 
    ref as (
        select ref_time 
        from status_table 
        group by ref_time 
        having bool_and(processed)
        order by ref_time desc limit 2
    ),
    del_ref as (
        delete from status_table s
        using ref r
        where s.ref_time = r.ref_time
    )
delete from data_table d
using ref r
where d.ref_time = r.ref_time

The first CTE,ref, returns the list of timestamps that you want to delete from the two other tables. I attempted to simplify the logic: you seem to want the top 2 timestamps that are fully processed (note that offset skips that many rows from the resultset, which is different than limit).

The second CTE deletes from status_table, and the last part of the query addresses data_table.

  • Related