Home > other >  How to simplify a case when *condition* in (select
How to simplify a case when *condition* in (select

Time:10-14

I have a table that create the relationship between users and groups that looks like:

UserGroups

group_id user_id
1 1
2 1
1 2
1 3

and the table users looks like:

id name
1 John
2 Mary
3 Bob
4 Carol

And I'm successfully selecting my results like this:

SELECT
...,
CASE
    WHEN users.id in (SELECT user_id from usergoups where group_id = 1) and not in (SELECT user_id from usergoups where group_id = 2) then 1
    WHEN users.id in (SELECT user_id from usergoups where group_id = 2) and not in (SELECT user_id from usergoups where group_id = 1) then 2
    WHEN users.id in (SELECT user_id from usergoups where group_id = 1) and in (SELECT user_id from usergoups where group_id = 2) then 1
    else class.group
end as class_group
...
LEFT JOIN users on users.id = 1

for the following sample output:

class_id class_group
1 1
2 1
3 1
4 1

I was wondering if it is possible to simplify the select and make it more efficient reducing the number of selections, as i could just have something like:

...
LEFT JOIN usergroups on usergroups.user_id = users.id
...

The only line that i could think of was:

...
WHEN 2 in usergroups then...

but it just trows a sintax error as i kinda expected

CodePudding user response:

We could avoid correlated subqueries in expressions in the SELECT list, if we instead use an inline view (mysql calls it a derived table) to check for the existence of rows in user_groups. Following this line in the outer query

    LEFT JOIN users ON users.id = 1

I'd add an outer join to an inline view query something like this:

    LEFT JOIN ( SELECT ug.user_id
                     , MAX(CASE ug.group_id WHEN 1 THEN ug.group_id ELSE NULL END) AS g1
                     , MAX(CASE ug.group_id WHEN 2 THEN ug.group_id ELSE NULL END) AS g2
                  FROM user_groups ug
                 WHERE ug.group_id IN (1,2)
                 GROUP
                    BY ug.user_id
              ) g
           ON g.user_id = users.id

The view query will return one (or zero) rows for each user_id in the user_groups table, so it flattens to

         user_id    g1    g2
         -------  ----  ----
               1     1     2
               2  NULL     2
               3     1  NULL

For improved performance, if the outer query is restricting results to just one user, e.g.

    ON users.id = 1

We can include a matching restriction in the inline view query,

                 WHERE ug.group_id IN (1,2)
                   AND ug.user_id = 1         /* <-- add this to match the condition in the outer query */

In the SELECT list of the outer query, we can do the conditional checks,

          CASE
          WHEN g.g1 AND g.g2 IS NULL THEN 1
          WHEN g.g2 AND g.g1 IS NULL THEN 2
          WHEN g.g1                  THEN 1
          ELSE class.group
          END AS foo

Note that the first WHEN is redundant. If the first WHEN condition is TRUE, then we know the second WHEN condition cannot be TRUE, so we could check that first, and then the third WHEN condition covers the first. This would return an equivalent result,

          CASE
          WHEN g.g2 AND g.g1 IS NULL THEN 2
          WHEN g.g1                  THEN 1
          ELSE class.group
          END

If we weren't needing to return class.group from the CASE expression, I'd be more tempted to move the logic down into the inline view. (We could move the checks for group_id 2 and group_id 1, and return 2 or 1 or NULL to the outer query, but we'd still need an expression to check the return from g to see if we return that, or return class.group

CodePudding user response:

Another way to avoid correlated subqueries in the SELECT list, the syntax is a bit cleaner than the approach in my other answer,

BUT we would only do this...

IF (and only IF) we have some guarantee that rows in user_groups are UNIQUE i.e. if theres a PRIMARY KEY or UNIQUE KEY constraint on the (group_id,user_id) tuple (or it could be the other way around (user_id,group_id))

THEN we could use two outer joins to the user_groups table

So following this line:

    LEFT JOIN users ON users.id = 1

We could add

    LEFT JOIN user_groups g1 ON g1.group_id = 1 AND g1.user_id = users.id
    LEFT JOIN user_groups g2 ON g2.group_id = 2 AND g2.user_id = users.id

Then in our expression in the SELECT list could be

    CASE
    WHEN g2.group_id AND g1.group_id IS NULL THEN 2
    WHEN g1.group_id THEN 1
    ELSE class.group
    END AS class_group

NOTE: in MySQL, when a numeric is evaluated as a boolean, a value of 0 (zero) is FALSE, any non-zero value is TRUE, and NULL is NULL.

So the above is equivalent shorthand for

    CASE
    WHEN g2.group_id <> 0 AND g1.group_id IS NULL THEN 2
    WHEN g1.group_id <> 0 THEN 1
    ELSE class.group
    END AS class_group

N.B. without guaranteed uniqueness, the JOINs have potential to match multiple rows from user_groups (e.g. consider if there are two rows (1,1),(1,1) ), that would double the number of rows returned, which we probably do not want)

The approach in my other answer is not subject to the same uniqueness requirement; any duplicate rows for a (user_id,group_id) will be collapsed into a single row for the user_id by the GROUP BY in the inline view.

  • Related