I have Table A
SnapshotDat | Invoice ID |
---|---|
2022-09-11 | 1111 |
2022-09-12 | 1111 |
2022-09-13 | 1111 |
2022-09-14 | 1111 |
2022-09-15 | 1111 |
2022-09-16 | 1111 |
2022-09-17 | 1111 |
2022-09-18 | 1111 |
2022-09-19 | 1111 |
2022-09-20 | 1111 |
2022-09-21 | 1111 |
2022-09-22 | 1111 |
2022-09-23 | 1111 |
2022-09-24 | 1111 |
2022-09-25 | 1111 |
Table B
Date | Invoice ID | Status |
---|---|---|
2022-09-11 | 1111 | draft |
2022-09-15 | 1111 | outstanding |
2022-09-20 | 1111 | pending |
2022-09-24 | 1111 | paid |
And I want to establish a join by Invoice ID and Dates, to have this table result
SnapshotDat | Invoice ID | Status |
---|---|---|
2022-09-11 | 1111 | draft |
2022-09-12 | 1111 | draft |
2022-09-13 | 1111 | draft |
2022-09-14 | 1111 | draft |
2022-09-15 | 1111 | outstanding |
2022-09-16 | 1111 | outstanding |
2022-09-17 | 1111 | outstanding |
2022-09-18 | 1111 | outstanding |
2022-09-19 | 1111 | outstanding |
2022-09-20 | 1111 | pending |
2022-09-21 | 1111 | pending |
2022-09-22 | 1111 | pending |
2022-09-23 | 1111 | pending |
2022-09-24 | 1111 | paid |
2022-09-25 | 1111 | paid |
Here's what I've tried:
SELECT a.SnapshotDate, a.invoiceid, b.status
FROM a
LEFT JOIN b ON a.invoiceid = b.invoiceid
AND a.SnapshotDate<=b.Date
CodePudding user response:
One option to solve this problem involves following these steps:
- applying
LEFT JOIN
between the two tables on matching "Invoice_ID" values and dates - computing a boolean (0/1) value when the status changes
- generating the new groups by computing a running sum of the boolean field
- extracting the maximum value in each partition (there will be only one non-null value for each partition)
WITH cte AS (
SELECT tabA.SnapshotDate,
tabA.Invoice_ID,
tabB.Status_,
SUM(CASE WHEN tabB.Status_ IS NOT NULL THEN 1 END) OVER(PARTITION BY tabA.Invoice_ID ORDER BY tabA.SnapshotDate) AS groups_
FROM tabA
LEFT JOIN tabB
ON tabA.Invoice_ID = tabB.Invoice_ID
AND tabA.SnapshotDate = tabB.Date_
)
SELECT SnapshotDate,
Invoice_ID,
MAX(Status_) OVER(PARTITION BY Invoice_ID, groups_) AS Status_
FROM cte
CodePudding user response:
Consider below approach (assuming your dates related columns are actually of date
data type
select a.*, status
from TableA as a
join (
select *,
ifnull(-1 lead(date) over(partition by invoiceId order by date), current_date()) lastDate
from TableB
) as b
on a.invoiceId = b.invoiceId
and snapshotDate between date and lastDate
if applied to sample data in your question - output is