Home > front end >  Delta Detection in Oracle table with 2 billion records using composite key
Delta Detection in Oracle table with 2 billion records using composite key

Time:01-07

Initial Load on Day 1

id key fkid
1 0 100
1 1 200
2 0 300

Load on Day 2

id key fkid
1 0 100
1 1 200
2 0 300
3 1 400
4 0 500

Need to find delta records Load on Day 2

id key address
3 1 400
4 0 500

Problem Statement Need to find delta records in minimum time with following facts 1: I have to process around 2 billion records initially from a table as mentioned below 2: Also need to find delta with minimal time so that I can process it quickly

Questions : 1: Will it be a time consuming process to identify delta especially during production downtime ? 2: How long should it take to identify delta with 3 numeric columns in a table out of which id & key forms a composite key.

Solution tried : 1: Use full join and extract delta with case nvl condition but looks to be costly.

    nvl(node1.id, node2.id) id,
    nvl(node1.key, node2.key) key,
    nvl(node1.fkid, node2.fkid) fkid
FROM
    TABLE_DAY_1       node1
    FULL JOIN TABLE_DAY_2   node2 ON node2.id = node1.id
WHERE
    node2.id IS NULL
    OR node1.id IS NULL;```

CodePudding user response:

You need two separate statements to handle this, one to detect new & changed rows, a separate one to detect deleted rows.

While it is cumberson to write, the fastest comparison is field-by-field, so:

SELECT /*  parallel(8) full(node1) full(node2) USE_HASH(node1 node) */ *
  FROM table_day_1 node1,
       table_day_2 node2
 WHERE node1.id = node2.id( )
   AND (node2.id IS NULL -- new rows
        OR node1.col1 <> node2.col2 -- changed val on non-nullable col
        OR NVL(node1.col3,' ') <> NVL(node2.col3,' ') -- changed val on nullable string
        OR NVL(node1.col4,-1) <> NVL(node2.col4,-1) -- changed val on nullable numeric, etc..
       )

Then for deleted rows:

SELECT /*  parallel(8) full(node1) full(node2) USE_HASH(node1 node) */ node2.id
  FROM table_day_1 node1,
       table_day_2 node2
 WHERE node1.id( ) = node2.id
   AND node1.id IS NULL -- deleted rows

You will want to make sure Oracle does a full table scan. If you have lots of CPUs and parallel query is enabled on your database, make sure the query uses parallel query (hence the hint). And you want a hash join between them. Work with your DBA to ensure you have enough temporary space to pull this off, and enough PGA to at least handle this with a single pass workarea rather than multipass.

  • Related