I have the following problem where I am kindly asking for your help when joining two tables in SQL Server 2016 (v13).
I have 2 tables, Revenues
and Cashins
.
Revenues
:
RevenueID | ProductID | InvoiceNo | Amount |
---|---|---|---|
123 | 456 | 987 | 1000 |
234 | 456 | 987 | 1000 |
Cashins
:
CashinID | ProductID | InoviceNo | Amount |
---|---|---|---|
ABC | 456 | 987 | 1000 |
CDE | 456 | 987 | 1000 |
The goal is to match cashins automatically to revenues (but only once!).
Both tables have their unique-ids but the columns used to join these tables are
- ProductID
- InvoiceNo
- Amount
For entries with only one row in each table with those criteria, everything works fine.
Sometimes though, there are several rows that have the same value within these columns (as above) but with a unique ID (this is no error, but the way it is supposed to be).
The problem with it is, that while joining it results in a cartesian product.
To recreate the tables, here the statements:
DROP TABLE IF EXISTS Revenues
GO
CREATE TABLE Revenues
(
RevenueID [nvarchar](10) NULL,
ProductID [nvarchar](10) NULL,
InvoiceNo [nvarchar](10) NULL,
Amount money NULL
)
GO
DROP TABLE IF EXISTS CashIns
GO
CREATE TABLE CashIns
(
CashinID [nvarchar](10) NULL,
ProductID [nvarchar](10) NULL,
InvoiceNo [nvarchar](10) NULL,
Amount money NULL
)
GO
INSERT INTO [Revenues] VALUES ('123', '456', '987', 1000)
INSERT INTO [Revenues] VALUES ('234', '456', '987', 1000)
INSERT INTO [CashIns] VALUES ('ABC', '456', '987', 1000)
INSERT INTO [CashIns] VALUES ('BCD', '456', '987', 1000)
Desired output:
RevenueID | ProductID | InvoiceNo | Amount | CashinID |
---|---|---|---|---|
123 | 456 | 987 | 1000 | ABC |
234 | 456 | 987 | 1000 | CDE |
SELECT
R.RevenueID,
R.ProductID,
R.InvoiceNo,
R.Amount,
C.CashinID,
FROM
[Revenues] R
LEFT JOIN
[CashIns] C ON R.ProductID = C.ProductID
AND R.InvoiceNo = C.InvoiceNo
AND R.Amount = C.Amount
Results:
RevenueID | ProductID | InvoiceNo | Amount | CashinID |
---|---|---|---|---|
123 | 456 | 987 | 1000 | ABC |
123 | 456 | 987 | 1000 | CDE |
234 | 456 | 987 | 1000 | ABC |
234 | 456 | 987 | 1000 | CDE |
Which in theory makes sense, but I just can't seem to find a solution where each row is just used once.
Two things I found and tried are windowing functions and the OUTER APPLY
function with a TOP(1)
selection. Both came to the same result:
SELECT
*
FROM
[Revenues] R
OUTER APPLY
(SELECT TOP(1) *
FROM [CashIns] C) C
Which returns the desired columns from the Revenues
table, but only matched the first appearance from the Cashins
table:
RevenueID | ProductID | InvoiceNo | Amount | CashinID |
---|---|---|---|---|
123 | 456 | 987 | 1000 | ABC |
234 | 456 | 987 | 1000 | ABC |
I also thought about something like updating the Revenues
table, so that the matched CashinID
is next to a line and then check every time that the CashinID
is not yet used within that table, but I couldn't make it work...
Many thanks in advance for any help or hint in the right direction!
CodePudding user response:
As I said in my comment, you have a fundamental problem with your data relationships. You need to reference the unique identifier of the other table in one of your tables. If you don't do that, then you can only order your transactions in both tables and join them by the row number. You're using a hope and prayer to join your data instead of unreputable identifier's.
--This example orders the transactions in each transaction table and uses
--the order number to join them.
WITH RevPrelim AS (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY InvoiceNo, ProductID, Amount ORDER BY RevenueID) as row_num
FROM [Revenues] R
) CashinsPrelim AS (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY InvoiceNo, ProductID Amount ORDER BY CashinID) as row_num
FROM [CashIns] as C
)
SELECT *
FROM RevPrlim as r
LEFT OUTER JOIN CashinsPrelim as c
ON c.ProductID = r.ProductID
AND c.InvoiceNo = r.InvoiceNo
AND c.Amount = r.Amount
AND c.row_num = r.row_num