EF core translate query form SQL-CodePudding

I have the following query which I run on my sqlite DB:

SELECT * FROM ServicesData
WHERE JobID IN (
  SELECT JobID
  FROM ServicesData
  WHERE JObID IS NOT NULL
  GROUP BY JobID
  ORDER BY MAX(Time) DESC LIMIT 3
)
ORDER BY Time;

(this gets all entries of the recent last 3 jobs, see table and result here SQL query for entries of last N jobs)

Running the query against my sqlite DB in LinqPad takes 0.02 seconds.

I am trying to translate this to linq-to-sql in EF Core

I tried:

ServicesData.Where(x => ServicesData
  .Where(y => y.JobID!=null)
  .GroupBy(y => y.JobID)
  .OrderByDescending(y => y.Max(z => z.Time))
  .Select(y => y.Key)
  .Take(3)
  .Contains(x.JobID)
)

Which seems to give the correct same results as the SQL above

However LinqPad shows the following SQL for it:

SELECT *
FROM "ServicesData" AS "s"
WHERE EXISTS (
    SELECT 1
    FROM (
        SELECT "s0"."JobID", MAX("s0"."Time") AS "c"
        FROM "ServicesData" AS "s0"
        WHERE "s0"."JobID" IS NOT NULL
        GROUP BY "s0"."JobID"
        ORDER BY MAX("s0"."Time") DESC
        LIMIT 3
    ) AS "t"
    WHERE ("t"."JobID" = "s"."JobID") OR ("t"."JobID" IS NULL AND "s"."JobID" IS NULL))

and it is very slow, taking 2.6 seconds !

For some reasons the contains was not translated to the IN operator

What am I doing wrong ?

CodePudding user response：

Does the query itself take 2.6 seconds, or is that the time it takes to get back your entities for the 3 resulting jobs? EF materializing entities takes time, so it will never be as fast as raw SQL spitting out results.

Still, from that query this part seems a bit of a concern:

OR ("t"."JobID" IS NULL AND "s"."JobID" IS NULL))

Since the inner query is only considering Job IDs that are not null, the above statement shouldn't alter the results, but it honestly isn't necessary. EF generating an EXISTS rather than an IN clause is fairly common with the SQL generation, and generally shouldn't pose an obvious performance hit.

You might try:

ServicesData.Where(x => x.JobID != null
    && ServicesData
        .Where(y => y.JobID!=null)
        .GroupBy(y => y.JobID)
        .OrderByDescending(y => y.Max(z => z.Time))
        .Select(y => y.Key)
        .Take(3)
        .Contains(x.JobID))
    .OrderBy(x => x.Time);

I'd be curious if that strips off the extra Null = Null allowance.

EF is a useful tool, but at the end of the day it needs to generate SQL to cover many possible permutations and combinations of operations so it cannot be expected to pick the most efficient SQL in all cases. Sometimes using two simpler queries, or resorting to an explicit generated SQL Statement / Sproc might be justified.

For instance:

var jobIds = ServiceData.Where(x => x.JobID != null)
    .GroupBy(x => x.JobID)
    .OrderByDescending(x => x.Max(y => y.Time))
    .Select(x => x.Key)
    .Take(3)
    .ToList();

var serviceData = ServiceData.Where(x => jobIds.Contains(x.JobId))
    .OrderBy(x => x.Time)
    .ToList();

It's a matter of weighing up performance against potentially introducing complexity in the forms of work-arounds to avoid expensive performance traps that might be present from EF's generation rules.

CodePudding user response：

In fact you don't do anything wrong! It's the EF issue that I encountered to many times! When you group by on a column x and then apply contains on x you get such a translation to SQL!