I'm trying to join two tables in BigQuery based on an id and the closest date match.
Transaction Table:
transactionId | dateTime | productId |
---|---|---|
4a50665e | 2022-05-13T14:12:55 | abc |
7d5889cd | 2022-05-22T16:10:21 | abc |
Product Log Table (log of when each productId is updated to a new version):
dateTime | productId | version |
---|---|---|
2022-05-19T06:37:24 | abc | v2 |
2022-05-12T04:38:23 | xyz | v1 |
2022-05-10T09:57:54 | abc | v1 |
I want to add a version column to the transaction table by looking up from the product log table, based on productId match and the dateTime match from the product table. To get the active product version at the time of the transaction.
Desired Result:
transactionId | dateTime | productId | version |
---|---|---|---|
4a50665e | 2022-05-13T14:12:55 | abc | v1 |
7d5889cd | 2022-05-22T16:10:21 | abc | v2 |
Something like this
SELECT
t.*,
p.version
FROM
transaction_table t
LEFT JOIN
product_log_table p
ON
t.productId = p.productId AND
t.datetime < p.dateTime
But that doesn't work. I've tried searching a lot and tried a number of solutions but can't get anything to work. Should be simple? How do I do this?
Thanks for any help.
CodePudding user response:
Consider below approach
select any_value(t).*,
string_agg(version, '' order by p.dateTime desc limit 1) as version
from Transaction_Table t
join Product_Log_Table p
on t.productId = p.productId
and t.dateTime >= p.dateTime
group by format('%t', t)
if applied to sample data in your question - output is