Home > Enterprise >  Get the correct count when joining three tables
Get the correct count when joining three tables

Time:06-21

I have three tables (simplified for the sake of the question)

users

id name
4 John Doe
5 Jane Doe

order_logs

id user_id order_id event
1 4 1 OrderConfirmed
2 4 2 OrderConfirmed
3 null 1 OrderDelivered
4 5 3 OrderCanceled

order_items

id order_id pricing_schema
1 1 regular
2 3 sale
3 1 upsell

What I'm trying to do, is to get how many orders each user confirmed, delivered, etc...

an order may contain an upsell item (pricing_schema = 'upsell')

the order_logs table is where the state of the order is saved (OrderConfirmed, OrderDelivered, etc...)

I ran the below query and get the correct numbers, for example:

id name confirmed delivered upsell
4 John doe 2 1 0

But if I pick up an order this user handled (confirmed by them), and add an upsell item to it, the count of each column gets incremented by 1

id name confirmed delivered upsell
4 John doe 3 2 1

I noticed that if I remove the last left join order_items I get the correct result but without the upsell count obviously.

Can anyone spot what's going wrong here?

Thank you.

select
    users.id,
    users.name,
    COUNT(
        CASE
            WHEN log_1.event = 'OrderConfirmed' THEN 1
        END
    ) as confirmed,
    COUNT(
        CASE
            WHEN log_2.event = 'OrderDelivered' THEN 1
        END
    ) as delivered,
    COUNT(
        CASE
            WHEN log_2.event = 'OrderDelivered'
            AND order_items.pricing_schema = 'upsell' THEN 1
        END
    ) as upsells
from
    users
    left join order_logs as log_1 on log_1.user_id = users.id
    and log_1.event in ('OrderConfirmed', 'OrderCanceled', 'OrderFailed')
    
    left join order_logs as log_2 on log_2.order_id = log_1.order_id 
    and log_1.user_id = users.id
    and log_2.event in ('OrderDelivered', 'OrderReturning')
    
    left join order_items on order_items.order_id = log_2.order_id
    and order_items.pricing_schema = 'upsell'
    
group by
    users.id

CodePudding user response:

Answer is for Postgresql but should work in other SQL's (assuming this joins are correct)

select
    users.id,
    users.name,
    COUNT(
        distinct CASE
            WHEN log_1.event = 'OrderConfirmed' THEN log_1.id
        END
    ) as confirmed,
    COUNT(
        distinct CASE
            WHEN log_2.event = 'OrderDelivered' THEN log_2.id
        END
    ) as delivered,
    COUNT(
        distinct CASE
            WHEN log_2.event = 'OrderDelivered'
            AND order_items.pricing_schema = 'upsell' THEN order_items.id
        END
    ) as upsells
from
    users
    left join order_logs as log_1 on log_1.user_id = users.id
    and log_1.event in ('OrderConfirmed', 'OrderCanceled', 'OrderFailed')
    
    left join order_logs as log_2 on log_2.order_id = log_1.order_id 
    and log_1.user_id = users.id
    and log_2.event in ('OrderDelivered', 'OrderReturning')
    
    left join order_items on order_items.order_id = log_2.order_id
    and order_items.pricing_schema = 'upsell'
    
group by
    users.id, users.name

I don't know structure, so maybe columns used for distinct should be changed from id to order_id

CodePudding user response:

This join is strange.... why are you putting in a join for log_1 here

left join order_logs as log_2 on log_2.order_id = log_1.order_id 
and log_1.user_id = users.id
and log_2.event in ('OrderDelivered', 'OrderReturning')

This should look like this

left join order_logs as log_2 on log_2.user_id = user.order_id 
and log_2.event in ('OrderDelivered', 'OrderReturning')

But the whole thing seems strange to me -- why are you doing two joins to the logs table -- there is no reason for it that I can see. Also, SUM will work better than COUNT in almost every case. I'd write the whole query like this

select
  users.id,
  users.name,
  SUM(CASE WHEN logs.event = 'OrderConfirmed' THEN 1 ELSE 0 END) AS confirmed,
  SUM(CASE WHEN logs.event = 'OrderDelivered' THEN 1 ELSE 0 END) AS delivered,
  SUM(CASE WHEN logs.event = 'OrderDelivered'
            AND items.pricing_schema = 'upsell' THEN 1 ELSE 0 END) AS upsells
FROM users
LEFT JOIN order_logs logs ON log.user_id = users.id        
LEFT JOIN order_items items ON items.order_id = logs.order_id
GROUP BY users.id, users.name

Here we only join to the logs table once and only join to the items table once. I also use SUM not COUNT. COUNT will ignore NULLS (which was a trick that you were using) but it might not be null and then you have an extra count. Between these two changes you should be good.

CodePudding user response:

I believe that because you are joining on via the pricing schema 'upsell' in a way that allows duplicate data (as something that is confirmed and delivered can also be upsold) and so gets joined twice.

Have you tried simply joining users to logs and then order_logs to order_items? At a glance, I think your SQL might be okay then (once you've redone the aliases)

Something like:

select
users.id,
users.name,
COUNT(
    CASE
        WHEN order_logs.event = 'OrderConfirmed' THEN 1
    END
) AS confirmed,
COUNT(
    CASE
        WHEN order_logs.event = 'OrderDelivered' THEN 1
    END
) AS delivered,
COUNT(
    CASE
        WHEN order_logs.event = 'OrderDelivered'
        AND order_items.pricing_schema = 'upsell' THEN 1
    END
) AS upsells
FROM users
LEFT JOIN order_logs ON log.user_id = users.id        
LEFT JOIN order_items ON order_items.order_id = order_logs.order_id
GROUP BY
    users.id

Based on comment and assuming there is only one order per user

select
  users.id,
  users.name,
  COUNT(
    CASE
        WHEN order_logs.event = 'OrderConfirmed' THEN 1
    END
  ) AS confirmed,
  COUNT(
    CASE
        WHEN order_logs.event = 'OrderDelivered' THEN 1
    END
  ) AS delivered,
  COUNT(
    CASE
        WHEN order_logs.event = 'OrderDelivered'
        AND order_items.pricing_schema = 'upsell' THEN 1
    END
  ) AS upsells
  FROM users
  LEFT JOIN ( -- normalize order logs to have user id for every row
     SELECT id, COALESCE(user_id, MAX(user_id) OVER (PARTION BY order_id)) as user_id, order_id, event
     FROM order_logs
  ) order_logs ON log.user_id = users.id        
  LEFT JOIN order_items ON order_items.order_id = order_logs.order_id
  GROUP BY
    users.id
  •  Tags:  
  • sql
  • Related