Using IQueryable with record type - InvalidOperationException-CodePudding

I am getting an InvalidOperationException when using a projection into a record type (.Net 7). The query works fine if the where clause is chained. It also works if where clause is not chained and the projection is not into a record (class, anonymous type both ok).

public record TestDto(
    int CellLocationId, 
    string AreaCode
);

This works - Chained:

var query = _context.myTable.Select(s => new TestDto(s.CellLocationId, s.AreaCode)).Where(x => x.AreaCode == areaCode);

This fails - Unchained with record:

var query = _context.myTable.Select(s => new TestDto(s.CellLocationId, s.AreaCode));
query = query.Where(x => x.AreaCode == areaCode);

This works - Unchained with anonymous type:

var query = _context.myTable.Select(s => new {s.CellLocationId, s.AreaCode});
query = query.Where(x => x.AreaCode == areaCode);

When projecting into a record the sql shown in the error appears as:

System.InvalidOperationException: The LINQ expression 'DbSet().Where(v => new TestDto(v.CellLocationId, v.AreaCode).AreaCode == __areaCode_0)' could not be translated.

Why can I not use a record when my IQueryable is using an unchained where clause?

CodePudding user response：

About `IEnumerable` and `IQueryable`

IQueryable extends IEnumerable
IEnumerable most of the time works in memory, with some exceptions
IQueryable will be translated into SQL Query
Both IQueryable and IEnumerable execution are delayed until they are materialized (e.g. using ToList, AsEnumerable on IQueryable, etc.)

The API is designed this way for convenience in working with database queries. It allows (somewhat) interchangeable linq chains between the two. However, there are some caveats that need to be kept in mind.

IQueryable can only infer whats within its Expression. You can't for example do something like this:

public int GetAge(DateTime date)
{
    ...
}

var firstUserAge = _context.users.First().Select(q => GetDate(q.DoY));

It won't work, see QA. Linq is simply unable to translate the GetDate method into SQL Query.

The Problem

Let's take a look with the third case (anonymous class).

var query = _context.myTable
     .Select(s => new {s.CellLocationId, s.AreaCode});
query = query.Where(x => x.AreaCode == areaCode);

Up to _context.MyTable the only known type is simply DbSet<myTable>. However, as it reach the Select(s => ...) it became aware of the anonymous type. Linq the consider that query should be an IQueryable<..__AnonymousType..>. Given the complete information is given to it. Linq can translate it into (roughly) sql query shown below and the result snuggly fits the anonymous type.

select s.cellLocationId, s.AreaCode
from myTable as s
where s.AreaCode = ....

Now, let's see what happens in the first case which surprisingly works.

var query = _context.myTable
     .Select(s => new TestDto(s.CellLocationId, s.AreaCode))
     .Where(x => x.AreaCode == areaCode);

Hang on a second! Did not I previously mentioned that you can't call a method on your program's side from within an IQueryable?

Yes, your code above won't work in the past, see discussion. You can find numerous examples of similar problem. You can also see examples on EF Core 7 documentation that they used anonymous types all the time. The microsoft's official tutorial on DTO also assigns properties manually one-by-one without using constructor.

However, EF Core 7 is pretty new. It is said to have better performance compared to earlier EF Core. The changes may involve reworks on how it materialize IQueryable. There are plenty of issues need to be sorted out.

You should run query intercepts and see the queries being feed into the database to see the difference between the two cases above. I'd bet that the first case actually fetch everything in myTable.

What We Should Do

We can simply map into DTO at a later part of the code (at the very end of the chain).

var query = _context.MyTable;

if(someFlag)
{
    query = query.Where(s => s.AreaCode = areaCode);
}

//... a bunch more filters

return query.AsEnumerable()
   .Select(s => new TestDto(s.CellLocationId, s.AreaCode));

This way, you can avoid the bug altogether while keeping most of the heavy-lifting on the database.

You may complain "I will have duplicate code for computing AreaCode." You have to remember, DTO in the first place does not contain business logic.

CodePudding user response：

Regarding what Bagus Tesa wrote, the most important parts there are:

However, EF Core 7 is pretty new. It is said to have better performance compared to earlier EF Core. The changes may involve reworks on how it materialize IQueryable. There are plenty of issues need to be sorted out.

and

We can simply map into DTO at a later part of the code (at the very end of the chain).

I do not like how that answer is written, and I started writing my own, and then, frankly, I ended up with my own wall of text, and I don't like it just the same as their answer. Oh well. Maybe someone finds it useful. I mark it with community-wiki, because I don't really want to compete. It's more like, IDK, addendum?

Regarding the latter point, there have been some issues with early-selects since the dawn of EF. Select-as-custom-class is a special construct designed to allow your query to return your custom entity types (instead of anon bags of fields), especially in cases where you Join several tables, where the result cannot be simply the context.table you start the query from.

Thanks to how IQueryable is designed, you can moreless put such a .Select anywhere in your query, but there's a catch: once IQueryable is now <of-you-custom-type>, your custom properties are exposed instead of original tables columns, and it gets increasingly harder to guarantee that your code doesn't try to map/execute something untranslatable.

For example, imagine:

// case1:
var result = context.myTable
        .Where(x => x.AreaCode == areaCode)
        .Select<MyClass>(...)
        .ToList();

// case2:
var result = context.myTable
        .Select<MyClass>(...)
        .Where(x => x.AreaCode == areaCode)
        .ToList();

In the first case, it's obvious. AreaCode is a column from MyTable, defined by whatever type of MyTable.

In second case, it's AreaCode but from MyClass, and since someone wanted to introduce a new type, we can assume it's not mapped in EF mappings to the same table as MyTable. So if EF hanles that, it may have to so some guessing what MyClass.AreaCode really is.

Furthermore, what if MyClass.AreaCode was:

private string theCode = "5";
public string AreaCode {
    set { theCode = value; }
    get { return theCode   "1"; }
}

In that case, case1 would still function properly. And case2 can never function properly. Not even mentioning, custom class constructors, we write .Select(.. => new MyClas(...)) and it's very tempting to hide some custom logic into the ctor, even if it's as simple as if/throw/ ArgumentNullException.

If EF handles such cases it in ANY way (like: assume props/ctors are transparent, ignoring logic inside), results will be unintuitive to the programmer: MyClass used in Query will behave differently than MyClass used in the runtime. EF shouldn't handle it, it should throw, but determining that a getter or constructor is pass-through without custom logic isn't that easy.

That's why anon-types new {foo=, bar=} are always preferred (2) as the intermediate types - they are 100% compiler-generated, the compiler knows everything about them, the compiler knows all fields/properties are plain and transparent, and that's why it's not a huge problem if you put .Select<anon> even early in the LINQ query. (1)

But let's get back to your code.

In that record-related code samples you provided:

// first:
var query = _context.myTable
       .Select(s => new TestDto(s.CellLocationId, s.AreaCode))
       .Where(x => x.AreaCode == areaCode);

// second:
var query = _context.myTable
       .Select(s => new TestDto(s.CellLocationId, s.AreaCode));
query = query.Where(x => x.AreaCode == areaCode);

These pieces of code, if there is really nothing else between context:get_myTable and IQueryable<record>:Where, something that you'd for some reason omit, then these two pieces of code are essentially the same. Or should be.

When we write return " mom ".ToUpper().Trim(); we don't expect it to have different results than if we write var x = " mom ".ToUpper(); x = x.Trim(); return x;, right? That's all complete basics of operation/result/variable semantics.

The only difference here is splitting such unmaterialized IQueryable 'result' of a .Select() to a variable. LINQ call chain is exactly the same. Names are exactly the same. Difference is purely syntactical. So if you see that one of them works, and the other fails, this means there's a bug in the compiler, and you should:

file a bug issue at roslyn github
get away from whatever this bug can be related to as far as possible (meaning not use records yet, or at least not early in the query)
or, at least test the hell out of it, so you know how to use it safely even bugged, and get as much of those tests automated

There's really not much else to be advised here.

(1) actually sometimes it is problematic. Every ".Select" in the chain changes the "shape of the query", as they call it sometimes in MSDN articles, and it usually breaks any .Include setups that were done on the original shape. But YMMV there.

(2) anon-types are perfect here from the LINQ point of view, but are often a nightmare for the developer who wants to do anything a bit more reusable or structured. You cannot return them easily from your methods, since the type is anonymous. And so on. This severely limits reuse, and I can easily imagine, that's why RecordTypes are introduced - to provide a plain type with guaranteed mechanics, a type that can be named and easily referred to.

About IEnumerable and IQueryable

The Problem

What We Should Do

About `IEnumerable` and `IQueryable`