Home > Net >  Entity Framework LINQ SQL Query Performance
Entity Framework LINQ SQL Query Performance

Time:11-25

Hello everyone I'm working on an API that returns a dish with its restaurant details from a database that has restaurants and their dishes. I'm wondering if the following makes the query any efficient by converting the first, to second:

from res in _context.Restaurant
join resdish in _context.RestaurantDish
on res.Id equals resdish.RestaurantId
where resdish.RestaurantDishId == dishId

Second:

from resdish in _context.RestaurantDish
where resdish.RestaurantDishId == dishId
join res in _context.Restaurant
on resdish.RestaurantId equals res.Id

The reason why I'm debating this is because I feel like the second version filters to the single restaurant dish, then joining it, rather than joining all dishes then filtering. Is this correct?

CodePudding user response:

You can use a profiler on your database to capture the SQL in both cases, or inspect the SQL that EF generates and you'll likely find that the SQL in both cases is virtually identical. It boils down to how the reader (developers) interprets the intention of the logic.

As far as building efficient queries in EF goes, EF is an ORM meaning it offers to map between an object-oriented model and a relational data model. It isn't just an API to enable translating Linq to SQL. Part of the power for writing simple and efficient queries is through the use of navigation properties and projection. A Dish will be considered the property of a particular Restaurant, while a Restaurant has many Dishes on its menu. This forms a One-to-Many relationship in the database, and navigation properties can map this relationship in your object model:

public class Restaurant
{
    [Key]
    public int RestaurantId { get; set; }
    // ... other fields

    public virtual ICollection<Dish> Dishes { get; set; } = new List<Dish>();
}

public class Dish
{
    [Key]
    public int DishId { get; set; }


    //[ForeignKey(nameof(Restaurant))]
    //public int RestaurantId { get; set; }

    public virtual Restaurant Restaurant { get; set; }
}

The FK propery for the Restaurant ID is optional and can be configured to use a Shadow Property. (One that EF knows about and generates, but isn't exposed in the Entity) I recommend using shadow properties for FKs mainly to avoid 2 sources of truth for relationships. (dish.RestaurantId and dish.Restaurant.RestaurantId) Changing the FK does not automatically update the relationship unless you reload the entity, and updating the relationship does not automatically update the FK until you call SaveChanges.

Now if you wanted to get a particular dish and it's associated restaurant:

var dish = _context.Dishes
    .Include(d => d.Restaurant)
    .Single(d => d.DishId == dishId);

This fetches both entities. Note that there is no need now to manually write Joins like you would with SQL. EF supports Join, but it should only be used in very rare cases where a schema isn't properly normalized/relational and you need to map loosely joined entities/tables. (Such as a table using an "OwnerId" that could join to a "This" or a "That" table based on a discriminator such as OwnerType.)

If you leave off the .Include(d => d.Restaurant) and have lazy loading enabled on the DbContext, then EF would attempt to automatically load the Restaurant if and when the first attempt of the code to access dish.Restaurant. This provides a safety net, but can incur some steep performance penalties in many cases, so it should be avoided or treated as a safety net, not a crutch.

Eager loading works well when dealing with single entities and their related data where you will need to do things with those relationships. For instance if I want to load a Restaurant and review, add/remove dishes, or load a Dish and possibly change the Restaurant. However, eager loading can come at a significant cost in how EF and SQL provides that related data behind the scenes.

By default when you use Include, EF will add an INNER or LEFT join between the associated tables. This creates a Cartesian Product between the involved tables. If you have 100 restaurants that have an average of 30 dishes each and select all 100 restaurants eager loading their dishes, the resulting query is 3000 rows. Now if a Dish has something like Reviews and there are an average of 5 reviews per dish and you eager load Dishes and Reviews, that would be a resultset of every column across all three tables with 15000 rows in total. You can hopefully appreciate how this can grow out of hand pretty fast. EF then goes through that Cartesian and populates the associated entities in the object graph. This can lead to questions about why "my query runs fast in SSMS but slow in EF" since EF can have a lot of work to do, especially if it has been tracking references from restaurants, dishes, and/or reviews to scan through and provide. Later versions of EF can help mitigate this a bit by using query splitting so instead of JOINs, EF can work out to fetch the related data using multiple separate SELECT statements which can execute and process a fair bit faster, but it still amounts to a lot of data going over the wire and needing memory to materialize to work with.

Most of the time though, you won't need ALL rows, nor ALL columns for each and every related entity. This is where Projection comes in such as using Select. When we pull back our list of restaurants, we might want to list the restaurants in a given city along with their top 5 dishes based on user reviews. We only need the RestaurantId & Name to display in these results, along with the Dish name and # of positive reviews. Instead of loading every column from every table, we can define a view model for Restaurants and Dishes for this summary View, and project the entities to these view models:

public class RestaurantSummaryViewModel
{
    public int RestaurantId { get; set; }
    public string Name { get; set; }
    public ICollection<DishSummaryViewModel> Top5Dishes { get; set; } = new List<DishSummaryViewModel>();
}

public class DishSummaryViewModel
{
    public string Name { get; set; }
    public int PositiveReviewCount {get; set; }
}

var restaurants = _context.Restaurants
    .Where(r => r.City.CityId == cityId)
    .OrderBy(r => r.Name)
    .Select(r => new RestaurantSummaryViewModel
    {
        RestaurantId = r.RestaurantId,
        Name = r.Name,
        Top5Dishes = r.Dishes
            .OrderByDescending(d => d.Reviews.Where(rv => rv.Score > 3).Count())
            .Select(d => new DishSummaryViewModel
            {
                Name = d.Name,
                PositiveReviewCount = d.Reviews.Where(rv => rv.Score > 3).Count()
            }).Take(5)
            .ToList();
    }).ToList();

Notice that the above Linq example doesn't use Join or even Include. Provided you follow a basic set of rules to ensure that EF can work out what you want to project down to SQL you can accomplish a fair bit producing far more efficient queries. The above statement would generate SQL to run across the related tables but would only return the fields needed to populate the desired view models. This allows you to tune indexes based on what data is most commonly needed, and also reduces the amount of data going across the wire, plus memory usage on both the DB and app servers. Libraries like Automapper and it's ProjectTo method can simplify the above statements even more, configuring how to select into the desired view model once, then replacing that whole Select( ... ) with just a ProjectTo<RestaurantSummaryViewModel>(config) where "config" is a reference to the Automapper configuration where it can resolve how to turn Restaurants and their associated entities into the desired view model(s).

In any case it should give you some avenues to explore with EF and learning what it can bring to the table to produce (hopefully:) easy to understand, and efficient query expressions.

  • Related