Can this query with duplicated WHERE clauses be optimized?-CodePudding

I have this query which selects the users for a given branch.

SELECT u.first, u.last, u.id, 
    (SELECT u1.user_type
    FROM users_branches AS u1
    WHERE u.id = u1.user_id AND u1.branch_id = @foo
    LIMIT 1)
FROM users AS u
WHERE EXISTS
    (SELECT 1
    FROM users_branches AS u0
    WHERE u.id = u0.user_id AND u0.branch_id = @foo)

And here is the EF Core LINQ query which produces this SQL.

int branchId = 123;
var users = await Users
    .Where(x => x.UsersBranches.Any(ub => ub.BranchId == branchId))
    .Select(user => new
    {
        UserId = user.Id,
        FirstName = user.First,
        LastName = user.Last,
        UserType = user.UsersBranches.FirstOrDefault(x => x.BranchId == branchId).UserType
    }).ToListAsync();

I don't like the WHERE clause needing to be duplicated just to say "get me only the users which have a users_branches record for this branch, and also return some of the data from that record". Is there a more efficient way to do this? I can imagine a solution that uses GROUP BY, but I'm not certain that would actually be any better.

CodePudding user response：

one way you can do this by using joins

SELECT u.id AS UserId, u.first AS FirstName, u.last AS LastName 
FROM users u
INNER JOIN users_branches ub ON u.id = ub.user_id
WHERE ub.branch_id = 123

another way you can do this by using subquery

SELECT u.id AS UserId, u.first AS FirstName, u.last AS LastName
FROM users u 
WHERE u.id IN (
  SELECT user_id 
  FROM users_branches 
  WHERE branch_id = 123
)

if you see main difference is that the join version queries both tables in one statement, while the subquery version filters UsersBranches first in a subquery before querying Users.

Note:The join is generally more efficient since it allows the database to access both tables at once and optimize the query plan. The subquery can result in slower performance since it splits the query into two parts.

-- hope this helps.

CodePudding user response：

I find this version easier to read:

SELECT users.first, users.last, users.id, MIN(users_branches.user_type)
FROM users
INNER JOIN users_branches ON
    users.id = users_branches.user_id
    AND users_branches.branch_id = @foo
GROUP BY users.id

as to whether it delivers better performance, that will depend on a number of factors, like indices etc. I recommend running some tests and examining the output of EXPLAIN SELECT .... for the different queries you try

CodePudding user response：

Maybe can you try something like this

SELECT u.first, u.last, u.id, u0.user_type
FROM users u
JOIN users_branches u0 ON u.id = u0.user_id AND u0.branch_id = @foo
WHERE NOT EXISTS (
    SELECT 1
    FROM users_branches u1
    WHERE u1.user_id = u0.user_id
    AND u1.branch_id = u0.branch_id
    AND u1.id < u0.id
)