Suppose I have two tables source
and destination
. The source table changes over time, for example from one day to the other. To give an example I will create here two tables source_before
and source_after
but in reality this is the very same table where some DML statements happen over time.
create table source_before (
id int,
name varchar2(40),
creation_time date
);
insert all
into source_before values (1,'bola','01-Jan-20')
into source_before values (2,'gol','02-Jan-21')
into source_before values (3,'cav','02-Jan-23')
into source_before values (4,'bhf','02-Jan-25')
select * from dual;
select * from source_before;
1 bola 01-JAN-20
2 gol 02-JAN-21
3 cav 02-JAN-23
4 bhf 02-JAN-25
create table source_after (
id int,
name varchar2(40),
creation_time date
);
insert all
into source_after values (1,'bola','01-Jan-20')
into source_after values (2,'gol','02-Jan-21')
into source_after values (5,'zzz','02-Jan-28')
into source_after values (6,'sss','02-Jan-25')
select * from dual;
select * from source_after;
1 bola 01-JAN-20
2 gol 02-JAN-21
5 zzz 02-JAN-28
6 sss 02-JAN-25
Now let's assume there is a destination
table that has been updated by the time source_before
existed , and thus contain the same data as source_before
.
create table destination (
id int,
name varchar2(40),
creation_time date
);
insert all
into destination values (1,'bola','01-Jan-20')
into destination values (2,'gol','02-Jan-21')
into destination values (3,'cav','02-Jan-23')
into destination values (4,'bhf','02-Jan-25')
select * from dual;
1 bola 01-JAN-20
2 gol 02-JAN-21
3 cav 02-JAN-23
4 bhf 02-JAN-25
Now if I want to update destination
to the new changes that source_after
created, which is the deletion of id 3 and 4
and the insertion of id 5 and 6
. I can do this with this statement for the insertions, but not for the deletions.
merge into destination d
using (select * from source_after) sa on (d.id = sa.id)
when matched then update
set
d.name = sa.name,
d.creation_time = sa.creation_time
when not matched then
insert (
d.id,
d.name,
d.creation_time
)
values
(
sa.id,
sa.name,
sa.creation_time
);
select * from destination;
1 bola 01-JAN-20
2 gol 02-JAN-21
3 cav 02-JAN-23
4 bhf 02-JAN-25
6 sss 02-JAN-25
5 zzz 02-JAN-28
As we can see rows 5
and 6
have been inserted but the statement is unable to delete the the rows 3
and 4
. So the goal is to reflect the changes and be able to capture both insertions and deletions on the source
table. Therefore, the result of destination
should be:
1 bola 01-JAN-20
2 gol 02-JAN-21
5 zzz 02-JAN-28
6 sss 02-JAN-25
CodePudding user response:
A Trigger can solve this issue for you
For this, one option is, in the destination table add some field(s) to track date, etc, when it was deleted/updated. I even suggest that on the large scale, may be do insert of new version of the row for Update, and mark record deleted for delete. This will allow to keep the entire history.
CREATE OR REPLACE TRIGGER TakeChangesToDestination
AFTER UPDATE OR INSERT OR DELETE ON SourceTable
FOR EACH ROW
BEGIN
IF INSERTING then
-- insert into destination
ELSEIF UPDATING Then
-- update destination by ID OR Insert new row for tracking history
ELSEIF DELETING Then
-- update destination by ID - mark as DELETED
end if;
END;
CodePudding user response:
If materialized view and db trigger can't be used then you should consider creating your own changes tracking system. Maybe you could consider defining your destination table with some additional columns.
With initial data in source_before as:
/* Initial data
source_before 0
----------------------
1 bola 01-JAN-20
2 gol 02-JAN-21
3 cav 02-JAN-23
4 bhf 02-JAN-25
*/
The same data are starting point for source_after so it looks the same, initialy.
You could have that initial data written in your destination table like below:
/*
ID NAME CREATION_TIME SNAP_ID_SEQUENCE SNAP_TIME STATUS
---- ---- ------------- ------------------ ------------------ -----------------
1 bola 01-Jan-20 0 01-Jan-23 19:00:00 INIT
2 gol 02-Jan-21 0 01-Jan-23 19:00:00 INIT
3 cav 02-Jan-23 0 01-Jan-23 19:00:00 INIT
4 bhf 02-Jan-25 0 01-Jan-23 19:00:00 INIT
*/
There are three columns added in destination table (you could do it differently) just as an idea of a potential solution.
SNAP_ID_SEQUENCE column's future values should be either sequence number or something similar as alternative so you could reference the right set of records to be compared with actual data in source_after table every time you take the snapshot.
Now, if your source_after table changes like in the question (rows 3 and 4 deleted and rows 5 and 6 added):
/*
after
----------------------
1 bola 01-JAN-20
2 gol 02-JAN-21
5 zzz 02-JAN-28
6 sss 02-JAN-25
*/
If you run the code below there will be 6 more rows in destination table:
INSERT INTO DESTINATION (ID, NAME, CREATION_TIME, SNAP_ID_SEQUENCE, SNAP_TIME, STATUS)
(
Select
COALESCE(a.ID, b.ID) "ID",
COALESCE(a.NAME, b.NAME) "NAME",
COALESCE(a.CREATION_TIME, b.CREATION_TIME) "CREATION_TIME",
1 "SNAP_ID_SEQUENCE", -- next_sequence_number = 1
SYSDATE "SNAP_TIME",
CASE WHEN Nvl(a.ID, -1) = b.ID THEN
CASE WHEN a.NAME != b.NAME THEN 'Name changed from ' || b.NAME || ' to ' || a.NAME END || '***' ||
CASE WHEN a.CREATION_TIME != b.CREATION_TIME THEN 'Creation changed from ' || To_Char(b.CREATION_TIME, 'dd-Mon-yy hh24:mi:ss') || ' to ' || To_Char(a.CREATION_TIME, 'dd-Mon-yy hh24:mi:ss') END
ELSE 'ID ' || b.ID || ' ' || 'IS DELETED'
END "STATUS"
From
destination b
Left Join
source_after a ON(a.ID = b.ID)
Where b.SNAP_ID_SEQUENCE = 0 -- next_sequence_number - 1 = 0
UNION ALL
Select
a.ID, a.NAME, a.CREATION_TIME,
1 "SNAP_ID_SEQUENCE", -- next_sequence_number = 1
SYSDATE "SNAP_TIME",
'NEW ROW * ID=' || a.ID "STATUS"
From
source_after a
Left Join
destination b ON(b.ID = a.ID And b.SNAP_ID_SEQUENCE = 0 ) -- next_sequence_number - 1 = 0
Where
b.ID Is Null
)
--
-- I n s e r t e d r o w s :
/* S n a p s h o t 1
ID NAME CREATION_TIME SNAP_ID_SEQUENCE SNAP_TIME STATUS
---------- -------------------------------- ------------- ---------------- --------- -------------------------
1 bola 01-JAN-20 1 04-JAN-23 ***
2 gol 02-JAN-21 1 04-JAN-23 ***
3 cav 02-JAN-23 1 04-JAN-23 ID 3 IS DELETED
4 bhf 02-JAN-25 1 04-JAN-23 ID 4 IS DELETED
5 zzz 02-JAN-28 1 04-JAN-23 NEW ROW * ID=5
6 sss 02-JAN-25 1 04-JAN-23 NEW ROW * ID=6
*/
Every time you do the next snapshot the value of SNAP_ID_SEQUENCE should be increased and passed to the code in the places marked in code. Note that in ON close of the second query and Where clause of the first query next_sequence number - 1 is used.
Second sample:
source_after changes: ID=6 is deleted, ID=7 is inserted and in ID=2 NAME has changed.
/*
after
----------------------
1 bola 01-JAN-20
2 gol_aaa 02-JAN-21
5 zzz 02-JAN-28
7 yyy 02-MAR-26
*/
If you run the above insert command but with sequence number 2 then the code and inserted rows would be like below:
INSERT INTO DESTINATION (ID, NAME, CREATION_TIME, SNAP_ID_SEQUENCE, SNAP_TIME, STATUS)
(
Select
COALESCE(a.ID, b.ID) "ID",
COALESCE(a.NAME, b.NAME) "NAME",
COALESCE(a.CREATION_TIME, b.CREATION_TIME) "CREATION_TIME",
2 "SNAP_ID_SEQUENCE", -- next_sequence_number = 2
SYSDATE "SNAP_TIME",
CASE WHEN Nvl(a.ID, -1) = b.ID THEN
CASE WHEN a.NAME != b.NAME THEN 'Name changed from ' || b.NAME || ' to ' || a.NAME END || '***' ||
CASE WHEN a.CREATION_TIME != b.CREATION_TIME THEN 'Creation changed from ' || To_Char(b.CREATION_TIME, 'dd-Mon-yy hh24:mi:ss') || ' to ' || To_Char(a.CREATION_TIME, 'dd-Mon-yy hh24:mi:ss') END
ELSE 'ID ' || b.ID || ' ' || 'IS DELETED'
END "STATUS"
From
destination b
Left Join
source_after a ON(a.ID = b.ID)
Where b.SNAP_ID_SEQUENCE = 1 -- next_sequence_number - 1 = 1
UNION ALL
Select
a.ID, a.NAME, a.CREATION_TIME,
2 "SNAP_ID_SEQUENCE", -- next_sequence_number = 2
SYSDATE "SNAP_TIME",
'NEW ROW * ID=' || a.ID "STATUS"
From
source_after a
Left Join
destination b ON(b.ID = a.ID And b.SNAP_ID_SEQUENCE = 1 ) -- next_sequence_number - 1 = 1
Where
b.ID Is Null
)
--
-- I n s e r t e d r o w s :
/* S n a p s h o t 2
ID NAME CREATION_TIME SNAP_ID_SEQUENCE SNAP_TIME STATUS
---------- -------------------------------- ------------- ---------------- --------- ---------------------------------------
1 bola 01-JAN-20 2 04-JAN-23 ***
2 gol_aaa 02-JAN-21 2 04-JAN-23 Name changed from gol to gol_aaa***
3 cav 02-JAN-23 2 04-JAN-23 ID 3 IS DELETED
4 bhf 02-JAN-25 2 04-JAN-23 ID 4 IS DELETED
5 zzz 02-JAN-28 2 04-JAN-23 ***
6 sss 02-JAN-25 2 04-JAN-23 ID 6 IS DELETED
7 yyy 02-MAR-26 2 04-JAN-23 NEW ROW * ID=7
*/
Please consider this just like an option that could be used as a kind of changes tracker if you realy could not use something like materialized view or db trigger.
And if you want to know the data from snapshot 1 then:
Select ID, NAME, CREATION_TIME
From DESTINATION
Where SNAP_ID_SEQUENCE = 1 And
STATUS NOT LIKE '%IS DELETED'
Order By ID
/*
ID NAME CREATION_TIME
---------- -------------------------------- -------------
1 bola 01-JAN-20
2 gol 02-JAN-21
5 zzz 02-JAN-28
6 sss 02-JAN-25
*/
... for snapshot 2 :
/*
ID NAME CREATION_TIME
---------- -------------------------------- -------------
1 bola 01-JAN-20
2 gol_aaa 02-JAN-21
5 zzz 02-JAN-28
7 yyy 02-MAR-26
*/
CodePudding user response:
You are going to need a database link of one kind or another, or you are going to need a middleware solution like NiFi or some such to act as an intermediary.
In either case with a DB link you need a merge to replicate the inserts and updates (such as the one in your example), and a separate delete command to handle the deletions. You can't do it all in a single command. If you can't modify the structure of the target database, then you'll have to create a link from the Oracle source database, and use something like this (no promises on performance):
merge into destination@dblink d
using (select * from source_after) sa on (d.id = sa.id)
when matched then update
set
d.name = sa.name,
d.creation_time = sa.creation_time
when not matched then
insert (
d.id,
d.name,
d.creation_time
)
values
(
sa.id,
sa.name,
sa.creation_time
);
delete from destination@dblink where id not in
(select id from source_after);
If you don't want a direct DB link (which isn't always possible or practical) then use a trigger in the source database to record the various transactions and their details like relevant column values in an audit table (Oracle doesn't have a convenient Change Data Capture feature in its most recent versions, so you have to make your own). Then use an external tool like NiFi to read that table, convert the audit records into individual DML commands that can be queued and applied to the target database. There are a number of different ways to do that with NiFi, and other data pipeline tools that could do the same thing as well, so I can't really get into specifics there, but hopefully you get the idea.
Also, if using an external tool like NiFi then anticipate the possibility that at some point the sync process will fail and you'll need to do a complete refresh on your target: plan a mechanism that can do a complete refresh (in fact do that first) as well as maintaining incremental updates over time.