Oracle delete and update billions of duplicate records from a table-CodePudding

Versioned Table has duplicate address records and needs to be deleted as below 1: Find duplicate records from a table as below Address

adr_id | ver_id  | address
1       1      newYork
1       2      newYork
1       3      newYork
4       1      Washington
4       2      Washington

2: Insert new records as below

adr_id | ver_id  | address
11    0    newYork
12    0    Washington

3: Delete existing duplicate records so final table would look like #2.

Note: Table has billions of records and required to be done in minimum time with best database technique.

CodePudding user response：

Use a MERGE statement correlated on the ROWID pseudo-column:

MERGE INTO table_name dst
USING (
  SELECT ROWID AS rid,
         COUNT(*) OVER (PARTITION BY adr_id) AS cnt,
         ROW_NUMBER() OVER (PARTITION BY adr_id ORDER BY ver_id) AS rn
  FROM   table_name
) src
ON (src.cnt > 1 AND dst.ROWID = src.rid)
WHEN MATCHED THEN
  UPDATE
  SET   adr_id = YOUR_ADR_ID_SEQUENCE.NEXTVAL,
        ver_id = 0
  DELETE WHERE rn > 1;

Which, for the sample data:

CREATE SEQUENCE your_adr_id_sequence START WITH 11;

CREATE TABLE table_name (adr_id, ver_id, address) AS
SELECT 1, 1, 'newYork' FROM DUAL UNION ALL
SELECT 1, 2, 'newYork' FROM DUAL UNION ALL
SELECT 1, 3, 'newYork' FROM DUAL UNION ALL
SELECT 4, 1, 'Washington' FROM DUAL UNION ALL
SELECT 4, 2, 'Washington' FROM DUAL;

Then, after the MERGE the table contains:

ADR_ID	VER_ID	ADDRESS
11	0	newYork
14	0	Washington

fiddle