I have the following two example tables
Orders Table
order_id | linked_order1 | linked_order2 |
---|---|---|
1001 | L005 | null |
1002 | null | null |
1003 | L006 | L007 |
Invoices Table
order_id | linked_order_id | charge |
---|---|---|
1001 | null | 4.27 |
1002 | null | 9.82 |
1003 | null | 7.42 |
null | L005 | 2.12 |
null | L006 | 1.76 |
null | L007 | 3.20 |
I need to join these so the charges of all the orders (linked and otherwise) can be shown as part of the single order row. My desired output is something like this.
Desired Output
order_id | linked_order1 | linked_order2 | invoices.charge | invoices.order_id | invoices.linked_order_id |
---|---|---|---|---|---|
1001 | L005 | null | 4.27 | 1001 | null |
2.12 | null | L005 | |||
1002 | null | null | 9.82 | null | null |
1003 | L006 | L007 | 7.42 | null | null |
1.76 | null | L006 | |||
3.20 | null | L007 |
I can manage to get the main order into the table as follows.
SELECT
orders,
ARRAY(
SELECT AS STRUCT * FROM `invoices_table` WHERE order=orders.order_id) AS invoice
FROM
`orders_table` AS orders
I can run a separate query to union all of the invoice results into a single table for given order ids but I can't combine this with the above query with out getting errors.
Something like this...
SELECT
orders,
ARRAY(
SELECT AS STRUCT * FROM
(SELECT * FROM `invoices_table` WHERE order=orders.order_id
UNION ALL SELECT * FROM `invoices_table` WHERE linked_order_id=orders.linked_order1
UNION ALL SELECT * FROM `invoices_table` WHERE linked_order_id=orders.linked_order2)
) AS invoice
FROM
`orders_table` AS orders
But this gives me the correlated subqueries error.
[Update]
This is much simpler than I thought. The following query gives me what I was after.
SELECT
orders,
ARRAY(
SELECT AS STRUCT * FROM `invoices_table` WHERE order=orders.order_id OR linked_order_id IN(orders.linked_order1, orders.linked_order2)) AS invoice
FROM
`orders_table` AS orders
CodePudding user response:
Using CROSS JOINS,
SELECT o.*, ARRAY_AGG(i) invoices
FROM Orders o, Invoices i
WHERE o.order_id = i.order_id
OR i.linked_order_id IN (o.linked_order1, o.linked_order2)
GROUP BY 1, 2, 3;
Query results
[UPDATE]
Sometimes the query using OR conditions in WHERE clause might show poor perfomrance in large dataset. In that case you may try below query instead that generates same result.
SELECT o.*, ARRAY_AGG(i) invoices FROM (
SELECT o, i FROM Orders o JOIN Invoices i USING (order_id)
UNION ALL
SELECT o, i FROM Orders o JOIN Invoices i ON i.linked_order_id IN (o.linked_order1, o.linked_order2)
) GROUP BY 1, 2, 3;
CodePudding user response:
For the desired output table, the full outer join
is the right command.
with tblA as (Select order_id, 1 linked_order1, 2 linked_order2, from unnest([1,2,3]) order_id),
tblB as (Select order_id, 109.99 charge from unnest([3,4,5]) order_id
union all select null order_id, * from unnest([50.1,29.99]) charge
)
Select *
from tblA
full join tblB
using(order_id)
For your setting, there is the need to have several joining conditions. Therefore, the first table is used three times, for each joining key.
with tblA as (Select order_id, "L05" linked_order1, "L2" linked_order2, from unnest(["1","2","3"]) order_id),
tblB as (Select order_id, null linked_order_id, 109.99 charge from unnest(["3","4","5"]) order_id
union all select null order_id, "L05" , * from unnest([50.1,29.99]) charge
)
Select A.order_id,linked_order1,linked_order2, array_agg(struct(tblB.order_id,linked_order_id,charge))
from
(
Select * from tblA, unnest([order_id,linked_order1,linked_order2]) as tmp_id
) A
full join tblB
on tmp_id = ifnull(tblB.order_id,linked_order_id)
where charge is not null #or tmp_id=A.order_id
group by 1,2,3