I've a data feed that updates daily and I need to insert it into a table. I'd like to update rows with the same id but with different data, add non existent data and delete rows that don't exist anymore. I'm having problems with the last task, I can't seem to delete the rows from previous runs that don't exist in the new feed anymore.
This is what I have, the "listings" table with some test data:
INSERT INTO public.listings(
id, price, pub_date, rooms, publisher, updated_date)
VALUES
(1, 100, '2021-05-03', 3, 'mf', '2021-01-01'),
(2, 200, '2021-05-03', 4, 'pmf', '2021-01-01'),
(3, 300, '2021-05-04', 5, 'jorge', '2021-01-01'),
(4, 400, '2021-05-05', 3, 'yuyu', '2021-01-01'),
(5, 500, '2021-05-06', 3, 'papo', '2021-01-01');
A new insert clause (update to the feed):
INSERT INTO listings AS p (id, price, pub_date, rooms, publisher, updated_date)
VALUES
(1, 125, '2021-04-30', 4, 'mf', '2021-10-30')
, (2, 200, '2021-05-03', 4, 'pmf', '2021-01-01')
, (6, 250, '2021-04-30', 3, 'mf', '2021-10-30')
ON CONFLICT (id) -- simple clause
DO UPDATE
SET pub_date = EXCLUDED.pub_date
, price = EXCLUDED.price
, rooms = EXCLUDED.rooms
, updated_date = EXCLUDED.updated_date
WHERE (p.pub_date, p.price, p.rooms, p.updated_date) IS DISTINCT FROM
(EXCLUDED.pub_date, EXCLUDED.price, EXCLUDED.rooms, EXCLUDED.updated_date);
And this is the trigger:
CREATE OR REPLACE FUNCTION trg_prop_backup()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
INSERT INTO old_listings
VALUES (OLD.*);
DELETE FROM listings --this sentence generates an error on id duplicity
WHERE listings.id = OLD.id;
RETURN NEW;
END
$func$;
CREATE TRIGGER prop_listings_upd_bef
BEFORE UPDATE OR DELETE ON listings
FOR EACH ROW EXECUTE FUNCTION trg_prop_backup();
This already works to update the existing rows and insert the new ones, but it doesn't work for the rows that are not existant anymore on the update feed. There's something I'm doing wrong (or not doing) but I can't seem to come up with the solution.
The "old_listings" table contains the rows that doesn't exist on the new feed, because the data is updated or deleted.
The result of running the two insert queries has to be (for the listings table):
(1, 125, '2021-04-30', 4, 'mf', '2021-10-30')
, (2, 200, '2021-05-03', 4, 'pmf', '2021-01-01')
, (6, 250, '2021-04-30', 3, 'mf', '2021-10-30')
And the old_listings table:
(1, 100, '2021-05-03', 3, 'mf', '2021-01-01'),
(3, 300, '2021-05-04', 5, 'jorge', '2021-01-01'),
(4, 400, '2021-05-05', 3, 'yuyu', '2021-01-01'),
(5, 500, '2021-05-06', 3, 'papo', '2021-01-01');
CodePudding user response:
Here is my suggestion:
- Add an extra timestamp field
series_ts
in tablelistings
and set its value tocurrent_timestamp
upon insert or update; - remove the
where
clause of yourinsert on conflict update
query so thatseries_ts
will be updated even for equivalent records; - delete all
listings
records whoseseries_ts
is not equal tocurrent_timestamp
in the same transaction with the insert query.
In order to do this first create a 'before insert or update for each row' trigger that executes this function:
create function set_listings_series_ts() returns trigger language plpgsql as $$
begin
new.series_ts := current_timestamp;
return new;
end; $$;
After you have this in place run:
BEGIN TRANSACTION;
INSERT INTO listings AS p (id, price, pub_date, rooms, publisher, updated_date)
VALUES
(1, 125, '2021-04-30', 4, 'mf', '2021-10-30')
, (2, 200, '2021-05-03', 4, 'pmf', '2021-01-01')
, (6, 250, '2021-04-30', 3, 'mf', '2021-10-30')
ON CONFLICT (id) -- simple clause
DO UPDATE
SET pub_date = EXCLUDED.pub_date
, price = EXCLUDED.price
, rooms = EXCLUDED.rooms
, updated_date = EXCLUDED.updated_date; -- your query w/o WHERE
WITH t as
(
delete from listings
where series_ts < current_timestamp
returning *
)
insert into old_listings select * from t; -- delete old data, save in old_listings
COMMIT TRANSACTION;
You may do this w/o a trigger too if you change the 'insert ... update' query.
CodePudding user response:
My suggestion :
First delete all the rows in the table listings
, and create the corresponding rows in the table old_listings
:
WITH list AS
( DELETE FROM listings
WHERE true
RETURNING id, price, pub_date, rooms, publisher, updated_date
)
INSERT INTO old_listings (id, price, pub_date, rooms, publisher, updated_date)
SELECT id, price, pub_date, rooms, publisher, updated_date
FROM list ;
Then insert the new rows in table listings
and delete the corresponding rows in the table old_listing
by trigger :
INSERT INTO listings AS p (id, price, pub_date, rooms, publisher, updated_date)
VALUES
(1, 125, '2021-04-30', 4, 'mf', '2021-10-30')
, (2, 200, '2021-05-03', 4, 'pmf', '2021-01-01')
, (6, 250, '2021-04-30', 3, 'mf', '2021-10-30')
CREATE OR REPLACE FUNCTION trg_del_bacckup()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
DELETE FROM old_listings
WHERE id = NEW.id;
RETURN NEW;
END
$func$;
CREATE TRIGGER trg_del_bacckup
AFTER INSERT ON listings
FOR EACH ROW EXECUTE FUNCTION trg_del_bacckup();
PS : you also have to manage the conflicts in the table old_listings
when successive feed try to insert rows with ids which already exist