I have Website(Id)
table, each record may have multiple CheckLog(FK WebsiteId)
entries associated. CheckLog
also has a compound index of [WebsiteId, CreatedTime]
. Website
has only around 20 records but overtime CheckLog
would grow, 3 millions entries at the time I have this problem. (See schema using EF Core at the end of the question).
A frequent query I have is to query the list of all Website
s, along with zero/one latest CheckLog
record:
return await this.ctx.Websites.AsNoTracking()
.Select(q => new WebsiteListItem()
{
Website = q,
LatestCheckLog = q.CheckLogs
.OrderByDescending(q => q.CreatedTime)
.FirstOrDefault(),
})
.ToListAsync();
I believe the [WebsiteId, CreatedTime]
index should help. However, the query takes around 11s to execute. Here's the translated query, along with EXPLAIN QUERY PLAN
:
SELECT "w"."Id", "t0"."Id", "t0"."CreatedTime", "t0"."WebsiteId"
FROM "Websites" AS "w"
LEFT JOIN (
SELECT "t"."Id", "t"."CreatedTime", "t"."WebsiteId"
FROM (
SELECT "c"."Id", "c"."CreatedTime", "c"."WebsiteId", ROW_NUMBER() OVER(PARTITION BY "c"."WebsiteId" ORDER BY "c"."CreatedTime" DESC) AS "row"
FROM "CheckLogs" AS "c"
) AS "t"
WHERE "t"."row" <= 1
) AS "t0" ON "w"."Id" = "t0"."WebsiteId"
MATERIALIZE 1
CO-ROUTINE 4
SCAN TABLE CheckLogs AS c USING INDEX IX_CheckLogs_WebsiteId_CreatedTime
USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
SCAN SUBQUERY 4
SCAN TABLE Websites AS w
SEARCH SUBQUERY 1 AS t USING AUTOMATIC COVERING INDEX (WebsiteId=?)
Is this fixable with Index? If not, is there an efficient way to query it without creating N 1 queries? I tried to think of a way to do that with 2 queries but can't think of any better way to translate it the way EF Core does).
Also I believe this is a very common problem but I don't know what keyword I should use to find out solution for this kind of problem. I am okay with a general solution for this kind of problem (i.e. get the latest Product
of a list of Categories
). Thank you.
I use EF Core for DB Schema:
public class Website
{
public int Id { get; set; }
// Other properties
public ICollection<CheckLog> CheckLogs { get; set; }
}
[Index(nameof(CreatedTime))]
[Index(nameof(WebsiteId), nameof(CreatedTime))]
public class CheckLog
{
public int Id { get; set; }
public DateTime CreatedTime { get; set; }
public int WebsiteId { get; set; }
public Website Website { get; set; }
// Other properties
}
CodePudding user response:
If what you want is to get the row with the latest CreatedTime
for each WebsiteId
then there is no need for any join.
Just aggregate and set the condition:
HAVING MAX(CreatedTime)
This is not standard SQL, but utilizes SQLite's bare columns:
SELECT *
FROM CheckLogs
GROUP BY WebsiteId
HAVING MAX(CreatedTime);
If you want to join it to Websites
:
SELECT w.Id, t.Id, t.CreatedTime, t.WebsiteId
FROM Websites AS w
LEFT JOIN (
SELECT *
FROM CheckLogs
GROUP BY WebsiteId
HAVING MAX(CreatedTime)
) AS t ON w.Id = t.WebsiteId;
CodePudding user response:
Thanks to this answer, I found out how to rewrite the query:
SELECT L.*
FROM CheckLogs L
INNER JOIN
(SELECT WebsiteId, Max(CreatedTime) AS CreatedTime
FROM CheckLogs
GROUP BY WebsiteId) L2
ON L.WebsiteId = L2.WebsiteId AND L.CreatedTime = L2.CreatedTime
Since EF Core cannot translate JOIN query, I rewrote the code this way, 2 round trip to database server. Note that since WebsiteIds are numbers, it's safe to put it this way, if you put in string parameters, you need to sanitize them.
var websites = await query.ToListAsync();
var websiteIds = websites.Select(q => q.Id).ToList();
var websiteIdsString = string.Join(",", websiteIds);
var logQuery = this.ctx.CheckLogs.FromSqlRaw(@$"
SELECT L.*
FROM CheckLogs L
INNER JOIN
(SELECT WebsiteId, Max(CreatedTime) AS CreatedTime
FROM CheckLogs
GROUP BY WebsiteId) L2
ON L.WebsiteId = L2.WebsiteId AND L.CreatedTime = L2.CreatedTime
WHERE L.WebsiteId IN ({websiteIdsString})");
var logs = await logQuery.AsNoTracking().ToListAsync();
var logDict = logs.ToLookup(q => q.WebsiteId);
return websites.Select(q => new WebsiteListItem()
{
Website = q,
LatestCheckLog = logDict[q.Id].FirstOrDefault(),
}).ToList();
CodePudding user response:
First try this query:
SELECT MAX("c"."CreatedTime"), "c"."WebsiteId" FROM "CheckLogs" AS "c" GROUP BY "c"."WebsiteId"
It will be hard to optimize your query to be faster than the above. If it is fast, then there is hope.
you could try:
with logs as (
select max("c"."Id") "Id", max("c"."CreatedTime") "CreatedTime", "c"."WebsiteId"
from "CheckLogs" AS "c"
group by "c"."WebsiteId"
having count(*) = 1)
select "w"."Id", "l"."Id", "l"."CreatedTime", "l"."WebsiteId"
from "Websites" AS "w"
join logs as "l" on "w"."Id" = "l"."WebsiteId"
union all
select "w"."Id", null, null, null
from "Websites" AS "w"
where not exists (
select 'x'
from "CheckLogs" AS "l"
where "l"."WebsiteId" = "w"."Id")
You could also try the two halves of union all
above to see how fast they are.
If still slow you can add a new table with the result you are looking for and create trigger on "CheckLogs" to populate it, so the data is ready.