I have data in two tables. City and Address.
City
id - name - code
1 - abc - 1
2 - xyz - 2
3 - efg - 2
Address
id - city_id - name
1 - 1 - wer
2 - 2 - sdf
3 - 3 - tyu
Now code for cities with id 2 and 3 are the same. I want to delete the city with id 3 and change city_id 3 to 2 in the Address table so that the last two rows of Address table are associated with city id 2
In other words, I want to delete duplicate rows in the parent table and update the child table accordingly. There are thousands of such rows in my 2 tables.
CodePudding user response:
Create a table of pairs to be processed (cc), then update Address and delete from City accordingly
select c1.id idto, c2.id idfrom into cc
from (
select id, code, row_number() over(partition by code order by id) rn
from city
) c1
join city c2 on c1.rn = 1 and c1.code = c2.code and c2.id > c1.id;
update Address a
set city_id = cc.idto
from cc
where a.city_id=cc.idfrom;
delete from city
where id in (select idfrom from cc);
CodePudding user response:
If you are going to do this multiple times then I would suggest you to create a Stored Procedure and call it whenever you need it.
CREATE OR REPLACE PROCEDURE delete_dulicate_city()
LANGUAGE 'plpgsql'
AS $$
BEGIN
DROP TABLE IF EXISTS temp_city;
CREATE TEMPORARY TABLE IF NOT EXISTS temp_city
(
id_to INT,
id_from INT
);
INSERT INTO temp_city
SELECT c1.id id_to, c2.id id_from
FROM
(
SELECT id, code, row_number() over(partition BY code ORDER BY id) rn
FROM city
) c1
INNER JOIN city c2
ON c1.rn = 1
AND c1.code=c2.code
AND c2.id > c1.id;
UPDATE address a
SET city_id = tc.id_to
FROM temp_city tc
WHERE a.city_id=tc.id_from;
DELETE FROM city
WHERE id IN (SELECT id_from FROM temp_city);
END;
$$;
You can call it whenever you need just by executing CALL delete_duplicate_city();