I have a simple query that count records from 4 tables (NO JOINING):
SELECT count(tx._sequence_num) as txc,
count(o._sequence_num) as oc,
count(t._sequence_num) as tc,
count(ol._sequence_num) as olc
FROM `xxx.TAX_TRANSACTIONS` tx,
xxx.ORDER o,
xxx.TRANSACTION t,
xxx.ORDER_LINES ol
If I separate it to 4 queries like that:
SELECT count(tx._sequence_num) as txc FROM `xxx.TAX_TRANSACTIONS` tx; --202685
SELECT count(o._sequence_num) as oc FROM xxx.ORDER o; --175642
SELECT count(t._sequence_num) as tc FROM xxx.TRANSACTION t; --199392
SELECT count(ol._sequence_num) as olc FROM xxx.ORDER_LINES ol; --174947
It return just after 1-2 seconds (--xxxxxx in the right is the records count)
Same for this simple join, I never get the result:
SELECT ol.DEVICE_ID AS VIN,
tx.TAX_LINES AS SKU,
o.USER_ID AS ACCOUNT_DN,
o.ORDER_NUMBER,
cast(t.AMOUNT as FLOAT64)/100 AS TOTAL_AMOUNT ,
t.TRANSACTION_STATUS,
t.TRANSACTION_TYPE,
t.TRANSACTION_TAG,
t.CREATED_ON ,
tx.TAX_CALCULATED,
tx.TRANSACTION_STATUS AS TAX_TXN_STATUS,
tx.ERROR_MESSAGE REMARKS,
tx.TRANSACTION_ID AS TAX_TXN_ID,
tx.TAXATION_TYPE AS TAX_TXN_TYPE,
tx.TRANSACTION_DATE TAX_TXN_DATE
FROM xxx.TAX_TRANSACTIONS tx join
`xxx.ORDER` o on o.ORDER_NUMBER = tx.ORDER_NUMBER join
xxx.TRANSACTION t on o.ORDER_NUMBER = t.ORDER_NUMBER join
xxx.ORDER_LINES ol on o.ID = ol.ORDER_ID
WHERE (t.TRANSACTION_TYPE IN ("purchase") AND t.TRANSACTION_STATUS ="approved" AND tx.TAXATION_TYPE = "SalesInvoice") or
(t.TRANSACTION_TYPE IN ("refund") AND tx.TAXATION_TYPE = "ReturnInvoice") or
(tx.TRANSACTION_STATUS IN ("Error"))
ORDER BY CREATED_ON DESC
Is there something wrong with my query? Please let me know how to resolve the problem (joining). Thank you
CodePudding user response:
You say you're not doing any JOINs, but actually you are. Worse, you are doing CROSS JOINs. By putting 4 tables as you have done in your FROM clause you are implicitly joining all 4 of them together.
In other words, the number of rows produced by the join will be 202685 * 175642 * 199392 * 174947 = 1241835900000000000000 which is a humungous number. That's why your query doesn't complete.
Maybe take a look at the execution graph which is currently in preview (I can see it on your screenshot above) - it might give an indication into what operation is being performed here.
If you want COUNTs of the number of rows in each of those tables then you have to write 4 separate queries, as you have done.
UPDATE, as a demonstration I have a table that has 288 rows in it
select count(*)
from `project.dataset.t` a
returns 288
select count(*)
from `project.dataset.t` a,
`project.dataset.t` b
returns 82944
select count(*)
from `project.dataset.t` a,
`project.dataset.t` b,
`project.dataset.t` c
returns 23887872
select count(*)
from `project.dataset.t` a,
`project.dataset.t` b,
`project.dataset.t` c,
`project.dataset.t` d
returns 6879707136 (6.8billion). That's an enormous number, and that's for a table with only 288 rows in it. Your query will (as I said above) produce 1241835900000000000000 rows.
Here is the execution graph for my query that returns 6879707136: