I have a table that create the relationship between users and groups that looks like:
UserGroups
group_id | user_id |
---|---|
1 | 1 |
2 | 1 |
1 | 2 |
1 | 3 |
and the table users
looks like:
id | name |
---|---|
1 | John |
2 | Mary |
3 | Bob |
4 | Carol |
And I'm successfully selecting my results like this:
SELECT
...,
CASE
WHEN users.id in (SELECT user_id from usergoups where group_id = 1) and not in (SELECT user_id from usergoups where group_id = 2) then 1
WHEN users.id in (SELECT user_id from usergoups where group_id = 2) and not in (SELECT user_id from usergoups where group_id = 1) then 2
WHEN users.id in (SELECT user_id from usergoups where group_id = 1) and in (SELECT user_id from usergoups where group_id = 2) then 1
else class.group
end as class_group
...
LEFT JOIN users on users.id = 1
for the following sample output:
class_id | class_group |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 1 |
I was wondering if it is possible to simplify the select and make it more efficient reducing the number of selections, as i could just have something like:
...
LEFT JOIN usergroups on usergroups.user_id = users.id
...
The only line that i could think of was:
...
WHEN 2 in usergroups then...
but it just trows a sintax error as i kinda expected
CodePudding user response:
We could avoid correlated subqueries in expressions in the SELECT list, if we instead use an inline view (mysql calls it a derived table) to check for the existence of rows in user_groups
. Following this line in the outer query
LEFT JOIN users ON users.id = 1
I'd add an outer join to an inline view query something like this:
LEFT JOIN ( SELECT ug.user_id
, MAX(CASE ug.group_id WHEN 1 THEN ug.group_id ELSE NULL END) AS g1
, MAX(CASE ug.group_id WHEN 2 THEN ug.group_id ELSE NULL END) AS g2
FROM user_groups ug
WHERE ug.group_id IN (1,2)
GROUP
BY ug.user_id
) g
ON g.user_id = users.id
The view query will return one (or zero) rows for each user_id
in the user_groups
table, so it flattens to
user_id g1 g2
------- ---- ----
1 1 2
2 NULL 2
3 1 NULL
For improved performance, if the outer query is restricting results to just one user, e.g.
ON users.id = 1
We can include a matching restriction in the inline view query,
WHERE ug.group_id IN (1,2)
AND ug.user_id = 1 /* <-- add this to match the condition in the outer query */
In the SELECT list of the outer query, we can do the conditional checks,
CASE
WHEN g.g1 AND g.g2 IS NULL THEN 1
WHEN g.g2 AND g.g1 IS NULL THEN 2
WHEN g.g1 THEN 1
ELSE class.group
END AS foo
Note that the first WHEN
is redundant. If the first WHEN condition is TRUE, then we know the second WHEN condition cannot be TRUE, so we could check that first, and then the third WHEN condition covers the first. This would return an equivalent result,
CASE
WHEN g.g2 AND g.g1 IS NULL THEN 2
WHEN g.g1 THEN 1
ELSE class.group
END
If we weren't needing to return class.group
from the CASE expression, I'd be more tempted to move the logic down into the inline view. (We could move the checks for group_id 2 and group_id 1, and return 2 or 1 or NULL to the outer query, but we'd still need an expression to check the return from g to see if we return that, or return class.group
CodePudding user response:
Another way to avoid correlated subqueries in the SELECT list, the syntax is a bit cleaner than the approach in my other answer,
BUT we would only do this...
IF (and only IF) we have some guarantee that rows in user_groups
are UNIQUE i.e. if theres a PRIMARY KEY
or UNIQUE KEY
constraint on the (group_id,user_id)
tuple (or it could be the other way around (user_id,group_id)
)
THEN we could use two outer joins to the user_groups
table
So following this line:
LEFT JOIN users ON users.id = 1
We could add
LEFT JOIN user_groups g1 ON g1.group_id = 1 AND g1.user_id = users.id
LEFT JOIN user_groups g2 ON g2.group_id = 2 AND g2.user_id = users.id
Then in our expression in the SELECT list could be
CASE
WHEN g2.group_id AND g1.group_id IS NULL THEN 2
WHEN g1.group_id THEN 1
ELSE class.group
END AS class_group
NOTE: in MySQL, when a numeric is evaluated as a boolean, a value of 0 (zero) is FALSE, any non-zero value is TRUE, and NULL is NULL.
So the above is equivalent shorthand for
CASE
WHEN g2.group_id <> 0 AND g1.group_id IS NULL THEN 2
WHEN g1.group_id <> 0 THEN 1
ELSE class.group
END AS class_group
N.B. without guaranteed uniqueness, the JOINs have potential to match multiple rows from user_groups
(e.g. consider if there are two rows (1,1),(1,1)
), that would double the number of rows returned, which we probably do not want)
The approach in my other answer is not subject to the same uniqueness requirement; any duplicate rows for a (user_id,group_id)
will be collapsed into a single row for the user_id
by the GROUP BY
in the inline view.