detect deleted row in table that changes over time-CodePudding

Suppose I have two tables source and destination . The source table changes over time, for example from one day to the other. To give an example I will create here two tables source_before and source_after but in reality this is the very same table where some DML statements happen over time.

create table source_before (
id int,
name varchar2(40),
creation_time date
);

insert all 
into source_before values (1,'bola','01-Jan-20') 
into source_before values (2,'gol','02-Jan-21') 
into source_before values (3,'cav','02-Jan-23')
into source_before values (4,'bhf','02-Jan-25')
select * from dual;

select * from source_before;


1   bola 01-JAN-20
2   gol 02-JAN-21
3   cav 02-JAN-23
4   bhf 02-JAN-25



create table source_after (
id int,
name varchar2(40),
creation_time date
);

insert all 
into source_after values (1,'bola','01-Jan-20') 
into source_after values (2,'gol','02-Jan-21') 
into source_after values (5,'zzz','02-Jan-28')
into source_after values (6,'sss','02-Jan-25')
select * from dual;


select * from source_after;

1   bola 01-JAN-20
2   gol 02-JAN-21
5   zzz 02-JAN-28
6   sss 02-JAN-25

Now let's assume there is a destination table that has been updated by the time source_before existed , and thus contain the same data as source_before.

create table destination (
id int,
name varchar2(40),
creation_time date
);


insert all 
into destination values (1,'bola','01-Jan-20') 
into destination values (2,'gol','02-Jan-21') 
into destination values (3,'cav','02-Jan-23')
into destination values (4,'bhf','02-Jan-25')
select * from dual;

1   bola 01-JAN-20
2   gol 02-JAN-21
3   cav 02-JAN-23
4   bhf 02-JAN-25

Now if I want to update destination to the new changes that source_after created, which is the deletion of id 3 and 4 and the insertion of id 5 and 6. I can do this with this statement for the insertions, but not for the deletions.

merge into destination d
using (select * from source_after) sa on (d.id = sa.id)
when matched then update
set 
d.name = sa.name,
d.creation_time = sa.creation_time 
when not matched then
insert (
d.id,
d.name,
d.creation_time
)
values
(
sa.id,
sa.name,
sa.creation_time
);

select * from destination;

1   bola 01-JAN-20
2   gol 02-JAN-21
3   cav 02-JAN-23
4   bhf 02-JAN-25
6   sss 02-JAN-25
5   zzz 02-JAN-28

As we can see rows 5 and 6 have been inserted but the statement is unable to delete the the rows 3 and 4. So the goal is to reflect the changes and be able to capture both insertions and deletions on the source table. Therefore, the result of destination should be:

1   bola 01-JAN-20
2   gol 02-JAN-21
5   zzz 02-JAN-28
6   sss 02-JAN-25

CodePudding user response：

A Trigger can solve this issue for you

For this, one option is, in the destination table add some field(s) to track date, etc, when it was deleted/updated. I even suggest that on the large scale, may be do insert of new version of the row for Update, and mark record deleted for delete. This will allow to keep the entire history.

CREATE OR REPLACE TRIGGER TakeChangesToDestination 
AFTER UPDATE OR INSERT OR DELETE ON SourceTable
FOR EACH ROW 
  
BEGIN 
  IF INSERTING then
    -- insert into destination
  ELSEIF UPDATING Then
    -- update destination by ID OR Insert new row for tracking history
  ELSEIF DELETING Then
    -- update destination by ID - mark as DELETED
  end if;
END;

CodePudding user response：

If materialized view and db trigger can't be used then you should consider creating your own changes tracking system. Maybe you could consider defining your destination table with some additional columns.
With initial data in source_before as:

/*  Initial data
source_before 0                     
----------------------            
1   bola 01-JAN-20                  
2   gol  02-JAN-21                      
3   cav  02-JAN-23                      
4   bhf  02-JAN-25
*/

The same data are starting point for source_after so it looks the same, initialy.
You could have that initial data written in your destination table like below:

/*
  ID NAME CREATION_TIME   SNAP_ID_SEQUENCE SNAP_TIME          STATUS           
---- ---- ------------- ------------------ ------------------ -----------------
   1 bola 01-Jan-20                      0 01-Jan-23 19:00:00 INIT             
   2 gol  02-Jan-21                      0 01-Jan-23 19:00:00 INIT             
   3 cav  02-Jan-23                      0 01-Jan-23 19:00:00 INIT             
   4 bhf  02-Jan-25                      0 01-Jan-23 19:00:00 INIT             
*/

There are three columns added in destination table (you could do it differently) just as an idea of a potential solution.
SNAP_ID_SEQUENCE column's future values should be either sequence number or something similar as alternative so you could reference the right set of records to be compared with actual data in source_after table every time you take the snapshot.
Now, if your source_after table changes like in the question (rows 3 and 4 deleted and rows 5 and 6 added):

/*
after                   
----------------------
1   bola 01-JAN-20  
2   gol  02-JAN-21
5   zzz  02-JAN-28
6   sss  02-JAN-25
*/

If you run the code below there will be 6 more rows in destination table:

INSERT INTO DESTINATION (ID, NAME, CREATION_TIME, SNAP_ID_SEQUENCE, SNAP_TIME, STATUS) 
(       
        Select 
             COALESCE(a.ID, b.ID) "ID",    
             COALESCE(a.NAME, b.NAME) "NAME",
             COALESCE(a.CREATION_TIME, b.CREATION_TIME) "CREATION_TIME",
             1 "SNAP_ID_SEQUENCE",       -- next_sequence_number = 1
             SYSDATE "SNAP_TIME",
             CASE WHEN Nvl(a.ID, -1) = b.ID THEN 
                  CASE WHEN a.NAME != b.NAME THEN 'Name changed from ' || b.NAME || ' to ' || a.NAME END || '***' || 
                  CASE WHEN a.CREATION_TIME != b.CREATION_TIME THEN 'Creation changed from ' || To_Char(b.CREATION_TIME, 'dd-Mon-yy hh24:mi:ss') || ' to ' || To_Char(a.CREATION_TIME, 'dd-Mon-yy hh24:mi:ss') END
            ELSE  'ID ' || b.ID || ' ' || 'IS DELETED' 
            END "STATUS"
        From 
            destination b
        Left Join
            source_after a ON(a.ID = b.ID)   
        Where b.SNAP_ID_SEQUENCE = 0    --   next_sequence_number - 1 = 0
        UNION ALL
        Select
            a.ID, a.NAME, a.CREATION_TIME,
            1 "SNAP_ID_SEQUENCE",      -- next_sequence_number = 1
            SYSDATE "SNAP_TIME",
            'NEW ROW  *  ID=' || a.ID "STATUS"
        From    
            source_after a 
        Left Join
            destination b ON(b.ID = a.ID And b.SNAP_ID_SEQUENCE = 0 )   --   next_sequence_number - 1 = 0
        Where
            b.ID Is Null     
)   
--  
--  I n s e r t e d   r o w s :
/*  S n a p s h o t   1
        ID NAME                             CREATION_TIME SNAP_ID_SEQUENCE SNAP_TIME STATUS                                                                                                                                                
---------- -------------------------------- ------------- ---------------- --------- -------------------------
         1 bola                             01-JAN-20                    1 04-JAN-23 ***                      
         2 gol                              02-JAN-21                    1 04-JAN-23 ***                      
         3 cav                              02-JAN-23                    1 04-JAN-23 ID 3 IS DELETED          
         4 bhf                              02-JAN-25                    1 04-JAN-23 ID 4 IS DELETED          
         5 zzz                              02-JAN-28                    1 04-JAN-23 NEW ROW  *  ID=5         
         6 sss                              02-JAN-25                    1 04-JAN-23 NEW ROW  *  ID=6       
*/

Every time you do the next snapshot the value of SNAP_ID_SEQUENCE should be increased and passed to the code in the places marked in code. Note that in ON close of the second query and Where clause of the first query next_sequence number - 1 is used.
Second sample:
source_after changes: ID=6 is deleted, ID=7 is inserted and in ID=2 NAME has changed.

/*
after               
----------------------
1   bola     01-JAN-20  
2   gol_aaa  02-JAN-21
5   zzz      02-JAN-28
7   yyy      02-MAR-26
*/

If you run the above insert command but with sequence number 2 then the code and inserted rows would be like below:

INSERT INTO DESTINATION (ID, NAME, CREATION_TIME, SNAP_ID_SEQUENCE, SNAP_TIME, STATUS) 
(       
        Select 
             COALESCE(a.ID, b.ID) "ID",    
             COALESCE(a.NAME, b.NAME) "NAME",
             COALESCE(a.CREATION_TIME, b.CREATION_TIME) "CREATION_TIME",
             2 "SNAP_ID_SEQUENCE",       -- next_sequence_number = 2
             SYSDATE "SNAP_TIME",
             CASE WHEN Nvl(a.ID, -1) = b.ID THEN 
                  CASE WHEN a.NAME != b.NAME THEN 'Name changed from ' || b.NAME || ' to ' || a.NAME END || '***' || 
                  CASE WHEN a.CREATION_TIME != b.CREATION_TIME THEN 'Creation changed from ' || To_Char(b.CREATION_TIME, 'dd-Mon-yy hh24:mi:ss') || ' to ' || To_Char(a.CREATION_TIME, 'dd-Mon-yy hh24:mi:ss') END
            ELSE  'ID ' || b.ID || ' ' || 'IS DELETED' 
            END "STATUS"
        From 
            destination b
        Left Join
            source_after a ON(a.ID = b.ID)   
        Where b.SNAP_ID_SEQUENCE = 1    --   next_sequence_number - 1 = 1
        UNION ALL
        Select
            a.ID, a.NAME, a.CREATION_TIME,
            2 "SNAP_ID_SEQUENCE",      -- next_sequence_number = 2
            SYSDATE "SNAP_TIME",
            'NEW ROW  *  ID=' || a.ID "STATUS"
        From    
            source_after a 
        Left Join
            destination b ON(b.ID = a.ID And b.SNAP_ID_SEQUENCE = 1 )   --   next_sequence_number - 1 = 1
        Where
            b.ID Is Null     
)    
--  
--  I n s e r t e d   r o w s :
/*  S n a p s h o t   2
        ID NAME                             CREATION_TIME SNAP_ID_SEQUENCE SNAP_TIME STATUS                                 
---------- -------------------------------- ------------- ---------------- --------- ---------------------------------------
         1 bola                             01-JAN-20                    2 04-JAN-23 ***                                    
         2 gol_aaa                          02-JAN-21                    2 04-JAN-23 Name changed from gol to gol_aaa***    
         3 cav                              02-JAN-23                    2 04-JAN-23 ID 3 IS DELETED                        
         4 bhf                              02-JAN-25                    2 04-JAN-23 ID 4 IS DELETED                        
         5 zzz                              02-JAN-28                    2 04-JAN-23 ***                                    
         6 sss                              02-JAN-25                    2 04-JAN-23 ID 6 IS DELETED                        
         7 yyy                              02-MAR-26                    2 04-JAN-23 NEW ROW  *  ID=7                       
*/

Please consider this just like an option that could be used as a kind of changes tracker if you realy could not use something like materialized view or db trigger.

And if you want to know the data from snapshot 1 then:

Select ID, NAME, CREATION_TIME 
From DESTINATION  
Where SNAP_ID_SEQUENCE = 1 And
      STATUS NOT LIKE '%IS DELETED'
Order By ID

/*
        ID NAME                             CREATION_TIME
---------- -------------------------------- -------------
         1 bola                             01-JAN-20     
         2 gol                              02-JAN-21     
         5 zzz                              02-JAN-28     
         6 sss                              02-JAN-25    
*/

... for snapshot 2 :

/*
        ID NAME                             CREATION_TIME
---------- -------------------------------- -------------
         1 bola                             01-JAN-20     
         2 gol_aaa                          02-JAN-21     
         5 zzz                              02-JAN-28     
         7 yyy                              02-MAR-26    
*/

CodePudding user response：

You are going to need a database link of one kind or another, or you are going to need a middleware solution like NiFi or some such to act as an intermediary.

In either case with a DB link you need a merge to replicate the inserts and updates (such as the one in your example), and a separate delete command to handle the deletions. You can't do it all in a single command. If you can't modify the structure of the target database, then you'll have to create a link from the Oracle source database, and use something like this (no promises on performance):

merge into destination@dblink d
using (select * from source_after) sa on (d.id = sa.id)
when matched then update
  set 
    d.name = sa.name,
    d.creation_time = sa.creation_time 
when not matched then
  insert (
    d.id,
    d.name,
    d.creation_time
  )
  values
  (
    sa.id,
    sa.name,
    sa.creation_time
  );

delete from destination@dblink where id not in 
    (select id from source_after);

If you don't want a direct DB link (which isn't always possible or practical) then use a trigger in the source database to record the various transactions and their details like relevant column values in an audit table (Oracle doesn't have a convenient Change Data Capture feature in its most recent versions, so you have to make your own). Then use an external tool like NiFi to read that table, convert the audit records into individual DML commands that can be queued and applied to the target database. There are a number of different ways to do that with NiFi, and other data pipeline tools that could do the same thing as well, so I can't really get into specifics there, but hopefully you get the idea.

Also, if using an external tool like NiFi then anticipate the possibility that at some point the sync process will fail and you'll need to do a complete refresh on your target: plan a mechanism that can do a complete refresh (in fact do that first) as well as maintaining incremental updates over time.