Home > Software design >  Entity Framework Core 6 lazy loading takes "loading data when needed" too literal
Entity Framework Core 6 lazy loading takes "loading data when needed" too literal

Time:11-14

I have a simple class called Company and a corresponding service (CompanyService) to access an Azure SQL Database. A company contains Users.

public class Company
{
    [Key]
    public int Id { get; set; }

    [Required]
    [StringLength(64, MinimumLength = 5)]
    public string Name { get; set; }

    public ICollection<User> Users { get; set; } = new List<User>(); // removing the initialization does not make any difference
}

public class User
{
    [Key]
    public int Id { get; set; }

    [Required]
    [StringLength(64, MinimumLength = 5)]
    public string Name { get; set; }

    public Company Company { get; set; }

    public int CompanyId { get; set; }
}

Both of these models are added in my DbContext and I can make simple CRUD operations. Anyways, I started to play around with the UI and wanted to display the users of a company. So basically, you have a table with all companies -> you select a company and click the "edit" button -> a new view opens where the properties of the company can now be updated and all corresponding users are displayed.

After implementing this, I realized, that my Company.Users list is completely empty, even though there should be 10 dummy users in it. I checked the view, where all users are displayed and see that all of them are there. I navigated to my companies page again, selected the same company and what do I see? The company's users!

So what is the problem: The user data is only loaded from my CompanyService AFTER I have accessed the user data from my UserService. I have 0 clue, why this is happening.

I access my data like this:

public async Task<Company?> GetCompanyById(int id)
{
    return await this.Context.Company.Include(c => c.Users).FirstOrDefaultAsync(c => c.Id == id);
}

My database is created like this:

builder.Services.AddDbContextFactory<DatabaseContext>(options => options.UseSqlServer(builder.Configuration.GetConnectionString("DefaultConnection")));

CodePudding user response:

You're not actually using lazy loading at all. For Lazy loading to work you need to declare the Company.Users collection as virtual. Even then it will only work while the company is still within scope of the DbContext it was read from. This can trip you up when passing entities around. When Lazy Loading is enabled and you attempt to access an unloaded collection after the DbContext is disposed for instance, you will get an exception.

What you are seeing instead is EF's entity tracking and reference population behaviour. When you request an entity that has relationships to other entities, you can eager load those relationships to ensure the associated entities are loaded and referenced, or leave them for EF to work out. EF will automatically associate any related entities that it happens to already be tracking, regardless of whether you tell it to eager load or not.

So for example lets say I have a Parent Child relationship. Parent ID P1 has 3 children, ID C1, C3, and C3. With lazy loading disabled if I use the following statement:

var parent = context.Parents.Single(p => p.Id == "P1");
var count = parent.Children.Count();

I will get "0". (Provided the Children collection is initialized, otherwise I'd get a NullReferenceException) If I turn on lazy loading, or I eager load like below:

var parent = context.Parents.Include(p => p.Children).Single(p => p.Id == "P1");
var count = parent.Children.Count();

in both cases I would get "3".

Now where things get interesting. If lazy loading is disabled/Children is not virtual and I do the following.

var child = context.Children.Single(c => c.Id == "C1");
var parent = context.Parents.Single(p => p.Id == "P1");
var count = parent.Children.Count();

I will get a count of "1", not "0" because I didn't eager load, not "3" because lazy loading isn't applied here. If 2 of the related children happened to be tracked, I'd get "2". Now normally you won't do something so obvious, but any earlier code that "might" have loaded a tracked reference to related data to a row you are later loading will be automatically included in that new row's relation. This can lead to confusing bugs where sometimes related data seems to be available but not other times, or not a complete set of data is available.

This is one underlying reason why you want DbContext lifespans to be as short as possible. The more entities a DbContext is tracking the longer it takes to "work out" possible relationships between tracked entities to populate references, and also the more potentially stale references might be used. (Rather than loading fresh, current data from the DB)

The best practices with EF when it comes to avoiding issues like this:

  1. When reading information for display, use Projection rather than loading entities. I.e. load ViewModels using Select or Automapper's ProjectTo. This avoids loading tracked entities and ensures that data comes from current persisted data state. It also builds far more efficient queries than loading entities with eager loaded relationships. (Which get slow since EF is building Cartesian products behind the scenes)
  2. If you don't need to update data but do need to load entities, use AsNoTracking. This avoids entities stuffing the tracking cache, slowing things down and causing these problems.
  3. When updating entities, always eager load relationships that need to be updated. Ideally design systems to update child relations independent of the parent. I.e. AddChild, RemoveChild, UpdateChild rather than making changes to a children collection and trying to update them along with every other change witnin a single UpdateParent.
  4. Ensure that DbContexts are not alive any longer than they are absolutely needed. Long-lived DbContexts collect expensive baggage.
  • Related