I have been tasked with building a history table in SQL. I have already built the base table which contains multiple left joins amongst other things. The base table will need to be compared to another table and only update specific columns that have changed, insert new rows where the key doesn't match.
Previously I have used other ETL tools which have GUI style built in SCD loaders, but I don't have such luxury in SQL Server. Here the merge statement can handle such operations. I have used the MERGE
statement before, but I become a bit stuck when handling flags and date fields based on the operation performed.
Here is the BASE table
KEY | CLIENT | QUANTITY | CONTRACT_NO | FC_COUNT | DELETE_FLAG | RECORD_UPDATED_DATE |
---|---|---|---|---|---|---|
345 | A | 1000 | 5015 | 1 | N | 31/12/9999 |
346 | B | 2000 | 9352 | 1 | N | 31/12/9999 |
347 | C | 3000 | 6903 | 1 | N | 31/12/9999 |
348 | D | 1000 | 7085 | 1 | N | 31/12/9999 |
349 | E | 1000 | 8488 | 1 | N | 31/12/9999 |
350 | F | 500 | 6254 | 1 | N | 31/12/9999 |
Here is the table I plan to merge with
KEY | CLIENT | QUANTITY | CONTRACT_NO | FC_COUNT |
---|---|---|---|---|
345 | A | 1299 | 5015 | 1 |
346 | B | 2011 | 9352 | 1 |
351 | Z | 5987 | 5541 | 1 |
The results I'm looking for are
KEY | CLIENT | QUANTITY | CONTRACT_NO | FC_COUNT | DELETE_FLAG | RECORD_UPDATED_DATE |
---|---|---|---|---|---|---|
345 | A | 1000 | 5015 | 1 | N | 06/07/2022 |
345 | A | 1299 | 5015 | 1 | N | 31/12/9999 |
346 | B | 2000 | 9352 | 1 | N | 06/07/2022 |
346 | B | 2011 | 9352 | 1 | N | 31/12/9999 |
347 | C | 3000 | 6903 | 1 | Y | 06/07/2022 |
348 | D | 1000 | 7085 | 1 | Y | 06/07/2022 |
349 | E | 1000 | 8488 | 1 | Y | 06/07/2022 |
350 | F | 500 | 6254 | 1 | Y | 06/07/2022 |
351 | Z | 5987 | 5541 | 1 | N | 31/12/9999 |
As we can see I have shown the changes, closed off the old records, marked with a date and a delete flag if they are missing but was there previous, as well as new new row with the new key and data
Would this be a MERGE
? Some direction on how to perform this sort of operation would be a great help. We have a lot of tables where we need to keep change history and this will help a lot going forward.
code shell attempt
SELECT
MAIN_KEY,
CLIENT,
QUANTITY,
CONTRACT_NO,
1 AS FC_COUNT,
NULL as DELETE_FLG_DD,
GETDATE() as RECORD_UPDATED_DATE
INTO #G1_DELTA
FROM
[dwh].STG_DTL
MERGE [dwh].[PRJ1_DELTA] TARGET
USING #G1_DELTA SOURCE
ON TARGET.MAIN_KEY = SOURCE.MAIN_KEY
WHEN MATCHED THEN INSERT
(
MAIN_KEY,
CLIENT,
QUANTITY,
CONTRACT_NO,
FC_COUNT,
DELETE_FLG_DD,
RECORD_UPDATED_DATE
)
VALUES
(
SOURCE.MAIN_KEY,
SOURCE.CLIENT,
SOURCE.QUANTITY,
SOURCE.CONTRACT_NO,
SOURCE.FC_COUNT,
SOURCE.DELETE_FLG_DD,
SOURCE.RECORD_UPDATED_DATE
)
CodePudding user response:
If you need to build a history table containing the updated information from your two tables, you first need to select updated information from your two tables.
The changes that need to be applied to your tables are on:
- "tab1.[DELETE_FLAG]", that should be updated to
'Y'
whenever it has a match with tab2 - "tab1.[RECORD_UPDATED_DATE]", that should be updated to the current date
- "tab2.[DELETE_FLAG]", missing and that should be initialized to
N
- "tab2.[RECORD_UPDATED_DATE]", missing and that should be initialized to your random date
9999-12-31
.
Once these changes are made, you can apply the UNION ALL
to get the rows from your two tables together.
Then, in order to generate a table, you can use a cte to select the output result set and use the INTO <table>
clause after a selection to generate your "history" table.
WITH cte AS (
SELECT tab1.[KEY],
tab1.[CLIENT],
tab1.[QUANTITY],
tab1.[CONTRACT_NO],
tab1.[FC_COUNT],
CASE WHEN tab2.[KEY] IS NOT NULL
THEN 'N'
ELSE 'Y'
END AS [DELETE_FLAG],
CAST(GETDATE() AS DATE) AS [RECORD_UPDATED_DATE]
FROM tab1
LEFT JOIN tab2
ON tab1.[KEY] = tab2.[KEY]
UNION ALL
SELECT *,
'N' AS [DELETE_FLAG],
'9999-12-31' AS [RECORD_UPDATED_DATE]
FROM tab2
)
SELECT *
INTO history
FROM cte
ORDER BY [KEY];
Check the demo here.