I have an entity containing several navigation properties and I need to load three of them. The initial Linq query I wrote is:
await _context.EventParticipant
.Include(x => x.Employee)
.Include(x => x.ExcuseDescription)
.Include(x => x.RegistrationMethod)
.Where(x => x.EventId == eventId)
.ToListAsync();
This led to massive performance issues, causing the query to seemingly never finish when there were about 300 rows to fetch. The first thing I took a look at was the query generated by LINQ which was this:
SELECT
[e].[EmployeeNumber], [e].[EventId], [e].[ExcuseDescriptionId],
[e].[IsInvited], [e].[RegistrationMethodId],
[v].[employee_number], [v].[company_email],
[v].[department_name], [v].[employee_email],
[v].[employeecard_rfid], [v].[firstname], [v].[lastname],
[v].[status], [v].[mandant], [v].[upn], [v].[userid],
[e0].[Id], [e0].[Description], [r].[Id], [r].[Description]
FROM
[EventParticipant] AS [e]
INNER JOIN
[V_Employee] AS [v] ON [e].[EmployeeNumber] = [v].[employee_number]
LEFT JOIN
[ExcuseDescription] AS [e0] ON [e].[ExcuseDescriptionId] = [e0].[Id]
LEFT JOIN
[RegistrationMethod] AS [r] ON [e].[RegistrationMethodId] = [r].[Id]
WHERE
[e].[EventId] = @__eventId_0
Now, this query runs in SSMS in less than a second, so my next guess was that the problem is caused by entity tracking, but adding AsNoTracking()
before materializing the entities led to no improvements.
Interestingly enough, the debug output would show that the query was executed after about 16 seconds (which is still extremely slow), but the entities would still never materialize. The query does load a few extra columns so the next thing I tried was to select only the columns that I need, so I ended up with the following:
await _context.EventParticipant
.Where(x => x.EventId == eventId).AsNoTracking()
.Select(participant => new EventParticipantViewModel()
{
EventId = eventId,
EmployeeNumber = participant.EmployeeNumber,
Department = participant.Employee.DepartmentName,
FirstName = participant.Employee.Firstname,
LastName = participant.Employee.Lastname,
IsInvited = participant.IsInvited,
ExcuseDescription = participant.ExcuseDescription.Description ?? null,
RegisterMethod = participant.RegistrationMethod.Description ?? null,
})
.ToListAsync();
Now, this generated the following query:
SELECT @__eventId_0 AS [EventId], [e].[EmployeeNumber], [v].[department_name] AS [Department], [v].[firstname] AS [FirstName], [v].[lastname] AS [LastName], [e].[IsInvited], COALESCE([e0].[Description], NULL) AS [ExcuseDescription], COALESCE([r].[Description], NULL) AS [RegisterMethod]
FROM [EventParticipant] AS [e]
INNER JOIN [V_Employee] AS [v] ON [e].[EmployeeNumber] = [v].[employee_number]
LEFT JOIN [ExcuseDescription] AS [e0] ON [e].[ExcuseDescriptionId] = [e0].[Id]
LEFT JOIN [RegistrationMethod] AS [r] ON [e].[RegistrationMethodId] = [r].[Id]
WHERE [e].[EventId] = @__eventId_0
Although the query itself is not much different than the original one, the performance impact is immense: the query above executes and entities materialize in less than 2 seconds with the same 300-ish rows!
Could anyone please explain why is this happening? What kind of magic does the EF Core do in the background that would impact the performance like this?
CodePudding user response:
As @Stu mentioned, it's hard to say for sure not knowing the indexes involved or the amount of child collection records involved, but realize that .Include essentially does a select * (using all columns from the child table) which likely is pulling a different index than your custom projection. The custom projection may be using a covering index and thus increasing your performance.
The other thing you didn't mention is which version of EF (core I'm assuming) you are using. Some versions have generated drastically different queries from the same LINQ expression and some of them could result in better or worse performance. For example, earlier versions of EF core might split this query into multiple lazy loaded expressions or injects order by on the child table Ids which if not included in the indexes could lead to performance issues (particularly with unique identifier Ids).
CodePudding user response:
In both queries most of the time is spent waiting for your linked server. And plan changes for distributed queries can be very painful. So while you haven't captured the actual execution plan for any really long-running execution, it's probably caused by your linked server.
<WaitStats>
<Wait WaitType="OLEDB" WaitTimeMs="939" WaitCount="16" />
</WaitStats>
First, do you really have to use a linked server? Can't you just copy the data to the local database?
If you do use linked server, try to never join remote tables with local tables. You it's more reliable to load a temp table with data from your linked server and then join that to your local tables.