Home > Enterprise >  Join status by dates in bigquery
Join status by dates in bigquery

Time:10-18

I have Table A

SnapshotDat Invoice ID
2022-09-11 1111
2022-09-12 1111
2022-09-13 1111
2022-09-14 1111
2022-09-15 1111
2022-09-16 1111
2022-09-17 1111
2022-09-18 1111
2022-09-19 1111
2022-09-20 1111
2022-09-21 1111
2022-09-22 1111
2022-09-23 1111
2022-09-24 1111
2022-09-25 1111

Table B

Date Invoice ID Status
2022-09-11 1111 draft
2022-09-15 1111 outstanding
2022-09-20 1111 pending
2022-09-24 1111 paid

And I want to establish a join by Invoice ID and Dates, to have this table result

SnapshotDat Invoice ID Status
2022-09-11 1111 draft
2022-09-12 1111 draft
2022-09-13 1111 draft
2022-09-14 1111 draft
2022-09-15 1111 outstanding
2022-09-16 1111 outstanding
2022-09-17 1111 outstanding
2022-09-18 1111 outstanding
2022-09-19 1111 outstanding
2022-09-20 1111 pending
2022-09-21 1111 pending
2022-09-22 1111 pending
2022-09-23 1111 pending
2022-09-24 1111 paid
2022-09-25 1111 paid

Here's what I've tried:

SELECT a.SnapshotDate, a.invoiceid, b.status 
FROM a 
LEFT JOIN b ON a.invoiceid = b.invoiceid 
           AND a.SnapshotDate<=b.Date

CodePudding user response:

One option to solve this problem involves following these steps:

  • applying LEFT JOIN between the two tables on matching "Invoice_ID" values and dates
  • computing a boolean (0/1) value when the status changes
  • generating the new groups by computing a running sum of the boolean field
  • extracting the maximum value in each partition (there will be only one non-null value for each partition)
WITH cte AS (
    SELECT tabA.SnapshotDate,
           tabA.Invoice_ID,
           tabB.Status_,
           SUM(CASE WHEN tabB.Status_ IS NOT NULL THEN 1 END) OVER(PARTITION BY tabA.Invoice_ID ORDER BY tabA.SnapshotDate) AS groups_
    FROM      tabA
    LEFT JOIN tabB
           ON tabA.Invoice_ID = tabB.Invoice_ID 
          AND tabA.SnapshotDate = tabB.Date_
)
SELECT SnapshotDate,
       Invoice_ID,
       MAX(Status_) OVER(PARTITION BY Invoice_ID, groups_) AS Status_
FROM cte

CodePudding user response:

Consider below approach (assuming your dates related columns are actually of date data type

select a.*, status
from TableA as a
join (
  select *, 
    ifnull(-1   lead(date) over(partition by invoiceId order by date), current_date()) lastDate
  from TableB
) as b
on a.invoiceId = b.invoiceId
and snapshotDate between date and lastDate

if applied to sample data in your question - output is

enter image description here

  • Related