I need your help on deciding which query to use since we are facing performance issue with MySQL joins and Subqueries.
The problem is that I'm trying to find out user's 'first order date' while they should fit certain conditions:
order_status = 1(completed) or order_status = 2(canceled)
The Tables are tb_order and tb_user; All the columns that contain a 'time' are using Unix Time Stamp.
The result I need looks like this:
order_id | user_id | user_1st_order_date |
---|---|---|
1 | 47 | 1666876594 |
2 | 982 | 1667095997 |
Option 1: JOIN
Select
o.id as 'order_id',
u.id as 'user_id',
ox.create_time as 'user_1st_order_date'
from
tb_order o
left join tb_user u on o.user_id = u.id
/* here I have about 10 joins */
left join
(
select
ux.id,
ox.create_time
from
tb_user u
left join tb_order ox on ox.user_id = u.id
where
( ox.order_status = 1 or ox.order_status = 2 )
/* Orders can be (completed) or (canceled) */
group by
ux.id
) x on x.id = u.id
/* The thought here is by using group by `ux.id` I will get the
user's earliest completed or canceled order and it's `create_time`
then this can be used to `join` the order info */
where
o.create_time != 0
and
( o.order_status = 1 or o.order_status = 2 )
group by
o.id
Option 2: Subquery
Select
o.id as 'order_id',
u.id as 'user_id',
(
select
ox.create_time
from
tb_order ox
where
(ox.order_status = 1 or ox.order_status = 2)
and
ox.user_id = u.id
order by
ox.id asc
limit 1
) as 'user_1st_order_date'
from
tb_order o
left join tb_user u on o.user_id = u.id
/* here I have about 10 joins */
where
o.create_time != 0
and
( o.order_status = 1 or o.order_status = 2 )
group by
o.id
/* Option 1 stopped working somehow yesterday and start to give me the latest order time instead, and I don't know why. Though I can get the correct date back by putting 'Min()' in front of the ox.create_time */
left join
(
select
ux.id,
Min(ox.create_time)
Both worked but I'm trying to find the most efficient one since I'll use this on a daily basis to update our data source for Tableau Online.
Many thanks in advance.
CodePudding user response:
Just looking at query 1, you have set out a crazy set of table relationships.
- Starting with the Select in parentheses, you have a Left Join that implies there are users without orders. That's OK, but your Where filter is based solely on order status, which is NULL when there is no order, so all such users will be filtered out. There is no useful purpose being served by joining the tb_user table and it can be omitted from that subquery.
- In the outer query the Left join of tb_order to tb_user implies there are orders without users, but then joining the subquery using u.id instead of o.userid guarantees that nothing from the subquery will be usable in that case. Once again, there is no purpose served in bring tb_user in there either.
To get the desired result set you set out above, you can vastly simplify things by looking only at the tb_order table like Option 3 below:
Option 3
Select * From (
Select id as 'order_id', user_id as 'user_id'
,min(Case When order_status In (1,2) Then create_time End)
Over (Partition By user_id
Between unbounded preceding And unbounded following)
AS 'user_1st_order_date'
From tb_order
)
Where order_status in (1,2)
Order by order_id
This can be further simplified by moving the Where order_status in (1,2)
inside the inner query and removing the Case statement around the created_date, but it's less adaptable to use within other queries.