How to make 2 remote tables in sync along with determination of Delta records-CodePudding

We have 10 tables on vendor system and same 10 tables on our DB side along with 10 _HISTORIC tables i.e. for each table in order to capture updated/new records.

We are reading the main tables from Vendor system using Informatica to truncate and load into our tables. How do we find Delta records without using Triggers and CDC as it comes with cost on vendor system.

4 tables are such that which have 200 columns and records around 31K in each with expectation that 100-500 records might update daily. We are using Left Join in Informatica to load new Records in our Main and _HISTORIC tables.

But what's efficient approach to find the Updated records of Vendor table and load them in our _HISTORIC table ?

For new Records using query :

-- NEW RECORDS
INSERT INTO TABLEA_HISTORIC
SELECT FROM TABLEA
LEFT JOIN TABLEB
ON A.PK = B.PK
WHERE B.PK IS NULL

CodePudding user response：

I believe a system versioned temporary table will be something you are looking for here. You can create a system versioned table for any table in SQL server 2016 or later.

for example, say I have a table Employee

CREATE TABLE Employee
(
EmployeeId VARCHAR(20) PRIMARY KEY,
EmployeeName VARCHAR(255) NOT NULL,
EmployeeDOJ DATE,
ValidFrom datetime2 GENERATED ALWAYS AS ROW START,--Automatically updated by the system when the row is updated
ValidTo datetime2 GENERATED ALWAYS AS ROW END,--auto-updated
 PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)--to set the row validity period
)

the column ValidFrom, ValidTo determines the time period on which that particular row was active.

For More Info refer the micorsoft article

https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15

CodePudding user response：

Create staging tables, load wipe&load them. Next, use them for finding the differences that need to be load into your target tables.

The CDC logic needs to be performed this way, but it will not affect your source system.

Another way - not sure if possible in your case - is to load partial data based on some source system date or key. This way you stage only the new data. This improves performance a lot, but makes finding the deletes in source impossible.