Home > Enterprise >  Google bigquery is extremely slow on simple query
Google bigquery is extremely slow on simple query

Time:11-05

I have a simple query that count records from 4 tables (NO JOINING):

SELECT count(tx._sequence_num) as txc, 
count(o._sequence_num) as oc, 
count(t._sequence_num) as tc, 
count(ol._sequence_num) as olc  
FROM `xxx.TAX_TRANSACTIONS` tx, 
xxx.ORDER o, 
xxx.TRANSACTION t, 
xxx.ORDER_LINES ol

It never returns result to me enter image description here

If I separate it to 4 queries like that:

SELECT count(tx._sequence_num) as txc FROM `xxx.TAX_TRANSACTIONS` tx; --202685
SELECT count(o._sequence_num) as oc FROM xxx.ORDER o; --175642
SELECT count(t._sequence_num) as tc  FROM  xxx.TRANSACTION t; --199392
SELECT count(ol._sequence_num) as olc  FROM  xxx.ORDER_LINES ol; --174947

It return just after 1-2 seconds (--xxxxxx in the right is the records count)

Same for this simple join, I never get the result:

SELECT ol.DEVICE_ID AS VIN,
       tx.TAX_LINES AS SKU,
       o.USER_ID AS ACCOUNT_DN,
       o.ORDER_NUMBER,
       cast(t.AMOUNT as FLOAT64)/100 AS TOTAL_AMOUNT ,
       t.TRANSACTION_STATUS,
       t.TRANSACTION_TYPE,
       t.TRANSACTION_TAG,
       t.CREATED_ON ,
       tx.TAX_CALCULATED,
       tx.TRANSACTION_STATUS AS TAX_TXN_STATUS,
       tx.ERROR_MESSAGE REMARKS,
       tx.TRANSACTION_ID AS TAX_TXN_ID,
       tx.TAXATION_TYPE AS TAX_TXN_TYPE,
       tx.TRANSACTION_DATE TAX_TXN_DATE
        FROM xxx.TAX_TRANSACTIONS tx join
         `xxx.ORDER` o on o.ORDER_NUMBER = tx.ORDER_NUMBER join
         xxx.TRANSACTION t on o.ORDER_NUMBER = t.ORDER_NUMBER join
         xxx.ORDER_LINES ol on o.ID = ol.ORDER_ID
        WHERE (t.TRANSACTION_TYPE IN ("purchase") AND t.TRANSACTION_STATUS ="approved" AND tx.TAXATION_TYPE = "SalesInvoice") or
        (t.TRANSACTION_TYPE IN ("refund") AND tx.TAXATION_TYPE = "ReturnInvoice") or
        (tx.TRANSACTION_STATUS IN ("Error"))
    ORDER BY CREATED_ON DESC

Is there something wrong with my query? Please let me know how to resolve the problem (joining). Thank you

CodePudding user response:

You say you're not doing any JOINs, but actually you are. Worse, you are doing CROSS JOINs. By putting 4 tables as you have done in your FROM clause you are implicitly joining all 4 of them together.

In other words, the number of rows produced by the join will be 202685 * 175642 * 199392 * 174947 = 1241835900000000000000 which is a humungous number. That's why your query doesn't complete.

Maybe take a look at the execution graph which is currently in preview (I can see it on your screenshot above) - it might give an indication into what operation is being performed here.

If you want COUNTs of the number of rows in each of those tables then you have to write 4 separate queries, as you have done.


UPDATE, as a demonstration I have a table that has 288 rows in it

select count(*)
from `project.dataset.t` a

returns 288

select count(*)
from `project.dataset.t` a,
`project.dataset.t` b

returns 82944

select count(*)
from `project.dataset.t` a,
`project.dataset.t` b,
`project.dataset.t` c

returns 23887872

select count(*)
from `project.dataset.t` a,
`project.dataset.t` b,
`project.dataset.t` c,
`project.dataset.t` d

returns 6879707136 (6.8billion). That's an enormous number, and that's for a table with only 288 rows in it. Your query will (as I said above) produce 1241835900000000000000 rows.

Here is the execution graph for my query that returns 6879707136:

enter image description here

  • Related