Delete Duplicate in I$ table in ODI-CodePudding

We have a load plan in ODI. We get a repeating error in some on our scenarios due to duplicate records in the I% table. What we do is manually run the script every time the load plan fails.

DELETE FROM adw12_dw. I$_1558911580_4
      WHERE     (EFFECTIVE_FROM_DT, DATASOURCE_NUM_ID, INTEGRATION_ID) IN
                    (  SELECT EFFECTIVE_FROM_DT,
                              DATASOURCE_NUM_ID,
                              INTEGRATION_ID
                         FROM adw12_dw . I$_1558911580_4
                     GROUP BY EFFECTIVE_FROM_DT,
                              DATASOURCE_NUM_ID,
                              INTEGRATION_ID
                       HAVING COUNT (1) > 1)
            AND ROWID NOT IN
                    (  SELECT MIN (ROWID)
                         FROM adw12_dw . I$_1558911580_4
                     GROUP BY EFFECTIVE_FROM_DT,
                              DATASOURCE_NUM_ID,
                              INTEGRATION_ID
                       HAVING COUNT (1) > 1)
                
                commit;

Is there a way to automate the deletion of duplicate records in the Integration table?

CodePudding user response：

If you have duplicates in the source, best would be to handle that in the logic of the mapping. What could work is to add an expression component to add a row_rank column using an analytical function to rank the duplicates : row_number() over (partition by EFFECTIVE_FROM_DT, DATASOURCE_NUM_ID, INTEGRATION_ID order by ROWID). You can then add a filter with the condition row_rank = 1.

If you prefer to do a delete after inserting, you can edit the IKM and add the delete step before loading the target table.

You could also divide the integration in 3 different steps :

a mapping that would load a staging table instead of your final target table, with the duplicate
an ODI procedure that would perform the delete to remove the duplicates in the staging table
a mapping that would load the data from the staging area to the target table

CodePudding user response：

Maybe your duplicate counts over 2 because we must execute the delete query recursively. For example:

CREATE OR REPLACE PROCEDURE delete_dublicates
IS
BEGIN
    DELETE FROM TABLE1 WHERE ID IN 
        (
            SELECT max(ID) FROM TABLE1
            GROUP BY USER_ID, TYPE_ID 
            HAVING count(*) > 1
        );

    IF (SQL%ROWCOUNT > 0) THEN 
        delete_dublicates;
    END IF; 

END delete_dublicates;