When we load new records onto our system, we first hold them in a staging loadedRecords
table, where we attempt to match them to existing records on system.
Once a loaded record is matched with an existing record, we update the loaded record to reference the id of the existing record, however it's possible for there to be multiple "identical" records, ie multiple records with the same matching criteria, and I can't find a way to do this such that each existing record is matched to a different loaded record.
I currently have this:
UPDATE loadedRecords
SET recordID = r.id
FROM loadedRecords
JOIN records as r
ON r.amount = loadedRecords.amount
WHERE loadedRecords.recordID IS NULL
However this causes all 3 loaded records with an amount
of 400
to be matched to the first existing record with an amount
of 400
:
id | amount | recordID |
---|---|---|
1 | 400 | 1 |
2 | 500 | 4 |
3 | 400 | 1 |
4 | 400 | 1 |
Note that ids 1
, 3
, and 4
were all matched with recordID 1
. The result I want from this is:
id | amount | recordID |
---|---|---|
1 | 400 | 1 |
2 | 500 | 4 |
3 | 400 | 2 |
4 | 400 | 3 |
The exact order of the matching doesn't matter, as long as each existing record is matched to only one loaded record, and each loaded record is matched to only one existing record.
It's also possible for there to be loaded records that have no match, including the possibility of there being more "identical" loaded records than matching "identical" existing records (eg if there are 4 loaded records with an amount of 400
, but only 3 existing records with that amount)
These extra loaded records should be left unmatched with a recordID
of NULL
CodePudding user response:
This is actually quite easy to do if you use a simple trick which is the ROW_NUMBER() windowing function. With this function we can give each group of amt values a unique integer starting at 1 in a new column. Here is the code that does that for your tables:
SELECT *, ROW_NUMBER() OVER (PARTITION BY amount) AS RN
FROM loadedRecords
SELECT *, ROW_NUMBER() OVER (PARTITION BY amount) AS RN
FROM records
Now we just take these two and join them
UPDATE loadedRecords
SET recordID = r.id
FROM loadedRecords
JOIN (
SELECT id, amount, ROW_NUMBER() OVER (PARTITION BY amount) AS RN
FROM records
) r ON r.amount = loadedRecords.amount AND
r.RN = ROW_NUMBER() OVER (PARTITION BY loadedRecords.amount)
WHERE loadedRecords.recordID IS NULL