I need to join 2 tables with the 2nd table with JOIN condition that is based on the 3rd JOIN table.
The problem is, I couldn't get the value from 3rd JOIN table while still joining the 2nd table.
Table A: users
-------
user_id INT PRIMARY
Table B: orders
-------
order_id INT PRIMARY
user_id INT
invoice_id INT
Table C: invoices
-------
invoice_id INT PRIMARY
invoice_status VARCHAR [voided, paid]
This is what I wish to do:
SELECT
A.user_id,
B.order_id,
C.invoice_id,
C.invoice_status
FROM users A
LEFT JOIN orders B
ON (B.user_id = A.user_id
AND C.invoice_status = 'paid')
LEFT JOIN invoices C
ON (C.invoice_id = B.invoice_id)
The 9th line AND C.invoice_status = 'paid')
is referring to table C which has not been joined yet, so this query will cause Error in query (1054): Unknown column 'C.invoice_status' in 'on clause'
The reason I cannot have this condition in WHERE
clause is because I still want to return all users records regardless of whether they have any order or 'paid' invoice or not. So adding WHERE invoice_status = 'paid'
will NOT return users without any order and users with invoice_status
= 'voided'.
Also, when a user has two orders linked to two different invoice records, but 1 invoice's invoice_status
is 'paid', while the other is 'voided', I only want to return the 'paid' record. There can be many voided
invoices, but only 1 'paid' invoice record.
By the way, this is enforced: sql_mode=only_full_group_by
, so I can't do GROUP BY user_id
without having some kind of aggregated or condition on order_id
& invoice_id
fields.
SELECT
A.user_id,
B.order_id,
C.invoice_id,
C.invoice_status
FROM users A
LEFT JOIN orders B
ON (B.user_id = A.user_id)
LEFT JOIN invoices C
ON (C.invoice_id = B.invoice_id);
user_id | order_id | invoice_id | invoice_status |
---|---|---|---|
1 | 1 | 1 | voided |
1 | 2 | 2 | paid |
2 | 3 | 1 | voided |
3 | NULL | NULL | NULL |
My desired result:
user_id | order_id | invoice_id | invoice_status |
---|---|---|---|
1 | 2 | 2 | paid |
2 | 3 | 1 | voided |
3 | NULL | NULL | NULL |
Each user_id
must only return once, with invoice_status
= 'paid' being the preferred row when there are multiple related orders.
Appreciate it if anyone has any idea how to achieve this.
Thanks!
CodePudding user response:
Use ROW_NUMBER()
function And number the rows according to user_id
and sort according to invoice_status
then fetch the first row
SELECT user_id,
order_id,
invoice_id,
invoice_status
FROM
(SELECT
A.user_id,
B.order_id,
C.invoice_id,
C.invoice_status,
ROW_NUMBER() OVER(PARTITION BY A.user_id ORDER BY C.invoice_status) AS num
FROM users A
LEFT JOIN orders B
ON (B.user_id = A.user_id)
LEFT JOIN invoices C
ON (C.invoice_id = B.invoice_id)) t
WHERE num = 1
CodePudding user response:
I am able to solve it by using EXISTS
thanks to ProGu's comment.
SELECT
A.user_id,
B.order_id,
C.invoice_id,
C.invoice_status
FROM users A
LEFT JOIN orders B
ON B.user_id = A.user_id
AND EXISTS( SELECT
1
FROM
invoices
WHERE
invoices.invoice_id = B.invoice_id
AND invoices.invoice_status = 'paid')
LEFT JOIN invoices C
ON C.invoice_id = B.invoice_id
user_id | order_id | invoice_id | invoice_status |
---|---|---|---|
1 | 2 | 2 | paid |
2 | NULL | NULL | NULL |
3 | NULL | NULL | NULL |
Thanks!