Google bigquery is extremely slow on simple query-CodePudding

I have a simple query that count records from 4 tables (NO JOINING):

SELECT count(tx._sequence_num) as txc, 
count(o._sequence_num) as oc, 
count(t._sequence_num) as tc, 
count(ol._sequence_num) as olc  
FROM `xxx.TAX_TRANSACTIONS` tx, 
xxx.ORDER o, 
xxx.TRANSACTION t, 
xxx.ORDER_LINES ol

It never returns result to me

If I separate it to 4 queries like that:

SELECT count(tx._sequence_num) as txc FROM `xxx.TAX_TRANSACTIONS` tx; --202685
SELECT count(o._sequence_num) as oc FROM xxx.ORDER o; --175642
SELECT count(t._sequence_num) as tc  FROM  xxx.TRANSACTION t; --199392
SELECT count(ol._sequence_num) as olc  FROM  xxx.ORDER_LINES ol; --174947

It return just after 1-2 seconds (--xxxxxx in the right is the records count)

Same for this simple join, I never get the result:

SELECT ol.DEVICE_ID AS VIN,
       tx.TAX_LINES AS SKU,
       o.USER_ID AS ACCOUNT_DN,
       o.ORDER_NUMBER,
       cast(t.AMOUNT as FLOAT64)/100 AS TOTAL_AMOUNT ,
       t.TRANSACTION_STATUS,
       t.TRANSACTION_TYPE,
       t.TRANSACTION_TAG,
       t.CREATED_ON ,
       tx.TAX_CALCULATED,
       tx.TRANSACTION_STATUS AS TAX_TXN_STATUS,
       tx.ERROR_MESSAGE REMARKS,
       tx.TRANSACTION_ID AS TAX_TXN_ID,
       tx.TAXATION_TYPE AS TAX_TXN_TYPE,
       tx.TRANSACTION_DATE TAX_TXN_DATE
        FROM xxx.TAX_TRANSACTIONS tx join
         `xxx.ORDER` o on o.ORDER_NUMBER = tx.ORDER_NUMBER join
         xxx.TRANSACTION t on o.ORDER_NUMBER = t.ORDER_NUMBER join
         xxx.ORDER_LINES ol on o.ID = ol.ORDER_ID
        WHERE (t.TRANSACTION_TYPE IN ("purchase") AND t.TRANSACTION_STATUS ="approved" AND tx.TAXATION_TYPE = "SalesInvoice") or
        (t.TRANSACTION_TYPE IN ("refund") AND tx.TAXATION_TYPE = "ReturnInvoice") or
        (tx.TRANSACTION_STATUS IN ("Error"))
    ORDER BY CREATED_ON DESC

Is there something wrong with my query? Please let me know how to resolve the problem (joining). Thank you

CodePudding user response：

You say you're not doing any JOINs, but actually you are. Worse, you are doing CROSS JOINs. By putting 4 tables as you have done in your FROM clause you are implicitly joining all 4 of them together.

In other words, the number of rows produced by the join will be 202685 * 175642 * 199392 * 174947 = 1241835900000000000000 which is a humungous number. That's why your query doesn't complete.

Maybe take a look at the execution graph which is currently in preview (I can see it on your screenshot above) - it might give an indication into what operation is being performed here.

If you want COUNTs of the number of rows in each of those tables then you have to write 4 separate queries, as you have done.

UPDATE, as a demonstration I have a table that has 288 rows in it

select count(*)
from `project.dataset.t` a

returns 288

select count(*)
from `project.dataset.t` a,
`project.dataset.t` b

returns 82944

select count(*)
from `project.dataset.t` a,
`project.dataset.t` b,
`project.dataset.t` c

returns 23887872

select count(*)
from `project.dataset.t` a,
`project.dataset.t` b,
`project.dataset.t` c,
`project.dataset.t` d

returns 6879707136 (6.8billion). That's an enormous number, and that's for a table with only 288 rows in it. Your query will (as I said above) produce 1241835900000000000000 rows.

Here is the execution graph for my query that returns 6879707136: