We have a load plan in ODI. We get a repeating error in some on our scenarios due to duplicate records in the I% table. What we do is manually run the script every time the load plan fails.
DELETE FROM adw12_dw. I$_1558911580_4
WHERE (EFFECTIVE_FROM_DT, DATASOURCE_NUM_ID, INTEGRATION_ID) IN
( SELECT EFFECTIVE_FROM_DT,
DATASOURCE_NUM_ID,
INTEGRATION_ID
FROM adw12_dw . I$_1558911580_4
GROUP BY EFFECTIVE_FROM_DT,
DATASOURCE_NUM_ID,
INTEGRATION_ID
HAVING COUNT (1) > 1)
AND ROWID NOT IN
( SELECT MIN (ROWID)
FROM adw12_dw . I$_1558911580_4
GROUP BY EFFECTIVE_FROM_DT,
DATASOURCE_NUM_ID,
INTEGRATION_ID
HAVING COUNT (1) > 1)
commit;
Is there a way to automate the deletion of duplicate records in the Integration table?
CodePudding user response:
If you have duplicates in the source, best would be to handle that in the logic of the mapping.
What could work is to add an expression component to add a row_rank
column using an analytical function to rank the duplicates : row_number() over (partition by EFFECTIVE_FROM_DT, DATASOURCE_NUM_ID, INTEGRATION_ID order by ROWID)
.
You can then add a filter with the condition row_rank = 1
.
If you prefer to do a delete after inserting, you can edit the IKM and add the delete step before loading the target table.
You could also divide the integration in 3 different steps :
- a mapping that would load a staging table instead of your final target table, with the duplicate
- an ODI procedure that would perform the delete to remove the duplicates in the staging table
- a mapping that would load the data from the staging area to the target table
CodePudding user response:
Maybe your duplicate counts over 2 because we must execute the delete query recursively. For example:
CREATE OR REPLACE PROCEDURE delete_dublicates
IS
BEGIN
DELETE FROM TABLE1 WHERE ID IN
(
SELECT max(ID) FROM TABLE1
GROUP BY USER_ID, TYPE_ID
HAVING count(*) > 1
);
IF (SQL%ROWCOUNT > 0) THEN
delete_dublicates;
END IF;
END delete_dublicates;