I am trying to join the following tables in EF Core (SQL Server)
Table: Data
Id | Name | ParentId | ChildId |
---|---|---|---|
1 | Dina | 100 | 101 |
2 | Lucy | 100 | NULL |
3 | Sam | 100 | 102 |
4 | Ben | 200 | 201 |
5 | Max | 200 | 202 |
6 | Lily | 300 | NULL |
Table: Security
Id | UserId | ParentId | ChildId |
---|---|---|---|
1 | xyz | 100 | NULL |
2 | xyz | 200 | NULL |
3 | abc | 100 | 101 |
4 | abc | 200 | 202 |
Foreign keys are set up. In addition to the mentioned tables, there is a "Parents" and a "Childs" table.
Business logic
A user can either have access on parent level (ChildId = null) or on Child level (ChildId not null). Example
- User
xyz
has access to allData
records except toLily
(Data.Id = 6
). - User
abc
has only access toData
records withChildIds
101 and 202, i.e.:- Dina (Id = 1)
- Max (Id = 5)
T-SQL query
SELECT
*
FROM
Data AS d
INNER JOIN Security AS s ON
(
( s.ParentId = d.ParentId AND s.ChildId IS NULL )
OR
( s.ChildId = d.ChildId AND s.ChildId IS NOT NULL )
)
AND
s.UserId = 'xyz'
What's the best way to translate this into an EFCore lambda expression? The examples I found so far are all doing a simple join with one column only.
CodePudding user response:
I don't think using EF itself is an appropriate choice here, because EF still only works well with relationships between tables based on key-tuples (i.e.: foreign keys and primary keys), but your ParentId
and ChildId
columns are not keys, and Linq (even when used outside of EF) does not make it easy to perform any kind of JOIN
besides tuple-equality.
I think a better overall solution would be to define a FUNCTION
in your database that implements the Data
-Security
match logic, and then you can use that UDF from within EF (don't use a PROCEDURE
: table-valued FUNCTIONS
are much simpler and most important: they can be composed inside other queries, even in Linq EF queries, which you can't do with PROCEDURES
).
I don't know if you need a function that does a single access check (in which case use a scalar FUNCTION
) or if you want a table-valued function that returns the set of permitted UserId
values for a given Data.Id
- or the set of permitted Data.Id
values for a given UserId
.
Nevertheless, without using a FUNCTION
, and keeping things entirely in EF, means compromising your query to fit within the limitations of Linq's expressiveness - and EF's ability to generate SQL from that Linq expression (remember that EF and Linq are not the same thing).
So I think the best way is to make a UNION
of two inner-queries (aka derived-tables): the first half is the Security.ChildId IS NULL
results, the second half is the Security.ChildId IS NOT NULL
results.
In SQL this would be:
SELECT
d.*
FROM
dbo.Data AS d
INNER JOIN Security s ON
s.ParentId = d.ParentId
WHERE
s.ChildId IS NULL
AND
s.UserId = @userId
UNION ALL
SELECT
d.*
FROM
dbo.Data AS d
INNER JOIN Security s ON
s.ParentId = d.ParentId
AND
s.ChildId = d.ChildId
WHERE
s.ChildId IS NOT NULL
AND
s.UserId = @userId
Note how in the above query the INNER JOIN ON
expression is now an exact tuple equality and the other criteria (like s.ChildId IS NOT NULL
and s.UserId = @userId
) is moved to the WHERE
part. This shouldn't make a difference to the query's execution plan, but I think it makes the query conceptually easier to understand.
And so because the JOIN
criteria are now expressed only as tuple-equalities, it can now be converted to a Linq query.
public static async Task< List<Data> > GetDataThatUserCanAccessAsync( MyDbContext db, String userId, CancellationToken ct = default )
{
IQueryable<Data> firstHalf = db.Data
.Join(
inner: db.Security,
innerKeySelector: s => new { s.ParentId },
outerKeySelector: d => new { d.ParentId },
resultSelector: ( d, s ) => new { d, s }
)
.Where( t => t.s.ChildId == null )
.Where( t => t.s.UserId == userId )
.Select( t => t.d );
IQueryable<Data> secondHalf = db.Data
.Join(
inner: db.Security,
innerKeySelector: s => new { s.ParentId, s.ChildId },
outerKeySelector: d => new { d.ParentId, d.ChildId },
resultSelector: ( d, s ) => new { d, s }
)
.Where( t => t.s.ChildId != null )
.Where( t => t.s.UserId == userId )
.Select( t => t.d );
IQueryable<Data> query = firstHalf.Concat( secondHalf );
List<Data> results = await query.ToListAsync( ct ).ConfigureAwait(false);
return results;
}
When Linq is translated to SQL, there are some subtleties to be aware-of:
- SQL's
UNION
= Linq's.Concat
. - SQL's
UNION ALL
= Linq's.Union
.
I prefer to explicit parameter labels whenever I use .Join
in Linq because I keep on forgetting the exact order of the parameters, but it can be shorted down to just this:
(I strongly recommend you look at the generated SQL in both cases: it should be identical for both, but might also be different... and because this is a JOIN on non-key columns it's important that you ensure your database's INDEX
objects for the ParentId
and ChildId
are correct and the STATISTICS
objects are maintained (you are rebuilding them every week, right?), otherwise these queries (even in pure SQL without EF) will likely run very slowly if either table has more than a few hundred rows.
public static async Task< List<Data> > GetDataThatUserCanAccessAsync( MyDbContext db, String userId, CancellationToken ct = default )
{
List<Data> results = await db.Data
.Join( db.Security, s => new { s.ParentId }, d => new { d.ParentId }, ( d, s ) => new { d, s } )
.Where( t => t.s.ChildId == null && t.s.UserId == userId )
.Select( t => t.d )
.Concat(
db.Data
.Join( db.Security, s => new { s.ParentId, s.ChildId }, d => new { d.ParentId, d.ChildId }, ( d, s ) => new { d, s } )
.Where( t => t.s.ChildId != null && t.s.UserId == userId )
.Select( t => t.d )
)
.ToListAsync( ct )
.ConfigureAwait(false);
return results;
I'm really not a fan of how Linq's .Concat
and .Union
methods are extensions on IQueryable<T>
whereas in SQL they're top-level operators - which results in aesthetically unpleasing asymmetrical queries.
CodePudding user response:
I was writing something like @Dai's LINQ answer, but I used query syntax instead (which I prefer over fluent syntax for joins). Adapting the query portion of @Dai's solution, it would look like this (note my comments):
var firstHalf =
from d in db.Data
join s in db.Security
on s.ParentId equals d.ParentId
where s.ChildId == null && s.UserId == userId
select d; // or new { d, s } <-- if you want to capture both data and security parts
var secondHalf =
from d in db.Data
join s in db.Security
on new { d.ParentId, d.ChildId }
equals new { s.Parentid, s.ChildId }
where s.ChildId != null // <-- this where may not be necessary if d.ChildId must be nonnull
select d; // or new { d, s } if you want both Data and Security parts
var query = firstHalf.Concat(secondHalf);
Lastly, I will strongly suggest checking the generated SQL and its execution plan. I've been surprised by both. If you do not check, you may be surprised by the complexity of the generated SQL and performance differences.