I have three tables (simplified for the sake of the question)
users
id | name |
---|---|
4 | John Doe |
5 | Jane Doe |
order_logs
id | user_id | order_id | event |
---|---|---|---|
1 | 4 | 1 | OrderConfirmed |
2 | 4 | 2 | OrderConfirmed |
3 | null | 1 | OrderDelivered |
4 | 5 | 3 | OrderCanceled |
order_items
id | order_id | pricing_schema |
---|---|---|
1 | 1 | regular |
2 | 3 | sale |
3 | 1 | upsell |
What I'm trying to do, is to get how many orders each user confirmed, delivered, etc...
an order may contain an upsell item (pricing_schema = 'upsell')
the order_logs
table is where the state of the order is saved (OrderConfirmed, OrderDelivered, etc...)
I ran the below query and get the correct numbers, for example:
id | name | confirmed | delivered | upsell |
---|---|---|---|---|
4 | John doe | 2 | 1 | 0 |
But if I pick up an order this user handled (confirmed by them), and add an upsell item to it, the count of each column gets incremented by 1
id | name | confirmed | delivered | upsell |
---|---|---|---|---|
4 | John doe | 3 | 2 | 1 |
I noticed that if I remove the last left join order_items
I get the correct result but without the upsell count obviously.
Can anyone spot what's going wrong here?
Thank you.
select
users.id,
users.name,
COUNT(
CASE
WHEN log_1.event = 'OrderConfirmed' THEN 1
END
) as confirmed,
COUNT(
CASE
WHEN log_2.event = 'OrderDelivered' THEN 1
END
) as delivered,
COUNT(
CASE
WHEN log_2.event = 'OrderDelivered'
AND order_items.pricing_schema = 'upsell' THEN 1
END
) as upsells
from
users
left join order_logs as log_1 on log_1.user_id = users.id
and log_1.event in ('OrderConfirmed', 'OrderCanceled', 'OrderFailed')
left join order_logs as log_2 on log_2.order_id = log_1.order_id
and log_1.user_id = users.id
and log_2.event in ('OrderDelivered', 'OrderReturning')
left join order_items on order_items.order_id = log_2.order_id
and order_items.pricing_schema = 'upsell'
group by
users.id
CodePudding user response:
Answer is for Postgresql but should work in other SQL's (assuming this joins are correct)
select
users.id,
users.name,
COUNT(
distinct CASE
WHEN log_1.event = 'OrderConfirmed' THEN log_1.id
END
) as confirmed,
COUNT(
distinct CASE
WHEN log_2.event = 'OrderDelivered' THEN log_2.id
END
) as delivered,
COUNT(
distinct CASE
WHEN log_2.event = 'OrderDelivered'
AND order_items.pricing_schema = 'upsell' THEN order_items.id
END
) as upsells
from
users
left join order_logs as log_1 on log_1.user_id = users.id
and log_1.event in ('OrderConfirmed', 'OrderCanceled', 'OrderFailed')
left join order_logs as log_2 on log_2.order_id = log_1.order_id
and log_1.user_id = users.id
and log_2.event in ('OrderDelivered', 'OrderReturning')
left join order_items on order_items.order_id = log_2.order_id
and order_items.pricing_schema = 'upsell'
group by
users.id, users.name
I don't know structure, so maybe columns used for distinct should be changed from id
to order_id
CodePudding user response:
This join is strange.... why are you putting in a join for log_1 here
left join order_logs as log_2 on log_2.order_id = log_1.order_id
and log_1.user_id = users.id
and log_2.event in ('OrderDelivered', 'OrderReturning')
This should look like this
left join order_logs as log_2 on log_2.user_id = user.order_id
and log_2.event in ('OrderDelivered', 'OrderReturning')
But the whole thing seems strange to me -- why are you doing two joins to the logs table -- there is no reason for it that I can see. Also, SUM will work better than COUNT in almost every case. I'd write the whole query like this
select
users.id,
users.name,
SUM(CASE WHEN logs.event = 'OrderConfirmed' THEN 1 ELSE 0 END) AS confirmed,
SUM(CASE WHEN logs.event = 'OrderDelivered' THEN 1 ELSE 0 END) AS delivered,
SUM(CASE WHEN logs.event = 'OrderDelivered'
AND items.pricing_schema = 'upsell' THEN 1 ELSE 0 END) AS upsells
FROM users
LEFT JOIN order_logs logs ON log.user_id = users.id
LEFT JOIN order_items items ON items.order_id = logs.order_id
GROUP BY users.id, users.name
Here we only join to the logs table once and only join to the items table once. I also use SUM not COUNT. COUNT will ignore NULLS (which was a trick that you were using) but it might not be null and then you have an extra count. Between these two changes you should be good.
CodePudding user response:
I believe that because you are joining on via the pricing schema 'upsell' in a way that allows duplicate data (as something that is confirmed and delivered can also be upsold) and so gets joined twice.
Have you tried simply joining users to logs and then order_logs to order_items? At a glance, I think your SQL might be okay then (once you've redone the aliases)
Something like:
select
users.id,
users.name,
COUNT(
CASE
WHEN order_logs.event = 'OrderConfirmed' THEN 1
END
) AS confirmed,
COUNT(
CASE
WHEN order_logs.event = 'OrderDelivered' THEN 1
END
) AS delivered,
COUNT(
CASE
WHEN order_logs.event = 'OrderDelivered'
AND order_items.pricing_schema = 'upsell' THEN 1
END
) AS upsells
FROM users
LEFT JOIN order_logs ON log.user_id = users.id
LEFT JOIN order_items ON order_items.order_id = order_logs.order_id
GROUP BY
users.id
Based on comment and assuming there is only one order per user
select
users.id,
users.name,
COUNT(
CASE
WHEN order_logs.event = 'OrderConfirmed' THEN 1
END
) AS confirmed,
COUNT(
CASE
WHEN order_logs.event = 'OrderDelivered' THEN 1
END
) AS delivered,
COUNT(
CASE
WHEN order_logs.event = 'OrderDelivered'
AND order_items.pricing_schema = 'upsell' THEN 1
END
) AS upsells
FROM users
LEFT JOIN ( -- normalize order logs to have user id for every row
SELECT id, COALESCE(user_id, MAX(user_id) OVER (PARTION BY order_id)) as user_id, order_id, event
FROM order_logs
) order_logs ON log.user_id = users.id
LEFT JOIN order_items ON order_items.order_id = order_logs.order_id
GROUP BY
users.id