Home > Software engineering >  BigQuery: Subquery with UNION as ARRAY
BigQuery: Subquery with UNION as ARRAY

Time:09-06

I have the following two example tables

Orders Table

order_id linked_order1 linked_order2
1001 L005 null
1002 null null
1003 L006 L007

Invoices Table

order_id linked_order_id charge
1001 null 4.27
1002 null 9.82
1003 null 7.42
null L005 2.12
null L006 1.76
null L007 3.20

I need to join these so the charges of all the orders (linked and otherwise) can be shown as part of the single order row. My desired output is something like this.

Desired Output

order_id linked_order1 linked_order2 invoices.charge invoices.order_id invoices.linked_order_id
1001 L005 null 4.27 1001 null
2.12 null L005
1002 null null 9.82 null null
1003 L006 L007 7.42 null null
1.76 null L006
3.20 null L007

I can manage to get the main order into the table as follows.

SELECT
  orders,
  ARRAY(
  SELECT AS STRUCT * FROM `invoices_table` WHERE order=orders.order_id) AS invoice
FROM
  `orders_table` AS orders

I can run a separate query to union all of the invoice results into a single table for given order ids but I can't combine this with the above query with out getting errors.

Something like this...

SELECT
  orders,
  ARRAY(
  SELECT AS STRUCT * FROM 
(SELECT * FROM `invoices_table` WHERE order=orders.order_id
    UNION ALL SELECT * FROM `invoices_table` WHERE linked_order_id=orders.linked_order1
    UNION ALL SELECT * FROM `invoices_table` WHERE linked_order_id=orders.linked_order2)
) AS invoice
FROM
  `orders_table` AS orders

But this gives me the correlated subqueries error.

[Update]

This is much simpler than I thought. The following query gives me what I was after.

SELECT
  orders,
  ARRAY(
  SELECT AS STRUCT * FROM `invoices_table` WHERE order=orders.order_id OR linked_order_id IN(orders.linked_order1, orders.linked_order2)) AS invoice
FROM
  `orders_table` AS orders

CodePudding user response:

Using CROSS JOINS,

SELECT o.*, ARRAY_AGG(i) invoices
  FROM Orders o, Invoices i 
 WHERE o.order_id = i.order_id 
    OR i.linked_order_id IN (o.linked_order1, o.linked_order2)
 GROUP BY 1, 2, 3;
Query results

enter image description here

[UPDATE]

Sometimes the query using OR conditions in WHERE clause might show poor perfomrance in large dataset. In that case you may try below query instead that generates same result.

SELECT o.*, ARRAY_AGG(i) invoices FROM (
  SELECT o, i FROM Orders o JOIN Invoices i USING (order_id)
   UNION ALL
  SELECT o, i FROM Orders o JOIN Invoices i ON i.linked_order_id IN (o.linked_order1, o.linked_order2)
) GROUP BY 1, 2, 3;

CodePudding user response:

For the desired output table, the full outer join is the right command.

with tblA as (Select order_id, 1 linked_order1, 2   linked_order2, from unnest([1,2,3]) order_id),
tblB as (Select order_id, 109.99 charge from unnest([3,4,5]) order_id
union all select null  order_id, * from unnest([50.1,29.99]) charge
)

Select *  
  from tblA
  full join tblB
  using(order_id)

For your setting, there is the need to have several joining conditions. Therefore, the first table is used three times, for each joining key.

with tblA as (Select order_id, "L05" linked_order1, "L2"    linked_order2, from unnest(["1","2","3"]) order_id),
tblB as (Select order_id, null linked_order_id, 109.99 charge from unnest(["3","4","5"]) order_id
union all select null  order_id, "L05" , * from unnest([50.1,29.99]) charge
)

Select A.order_id,linked_order1,linked_order2, array_agg(struct(tblB.order_id,linked_order_id,charge))
  from 
  (
    Select * from tblA, unnest([order_id,linked_order1,linked_order2]) as tmp_id
  ) A
  full join tblB
  on tmp_id = ifnull(tblB.order_id,linked_order_id)
  where charge is not null #or tmp_id=A.order_id
  group by 1,2,3
  • Related