T-SQL version of query
Let's setup a simple table with some data:
DROP TABLE IF EXISTS #OrdersTable
CREATE TABLE #OrdersTable
(
Id int,
Custid int
);
INSERT INTO #OrdersTable (Id, Custid) VALUES (1, 71);
INSERT INTO #OrdersTable (Id, Custid) VALUES (2, 71);
INSERT INTO #OrdersTable (Id, Custid) VALUES (3, 71);
INSERT INTO #OrdersTable (Id, Custid) VALUES (4, 72);
INSERT INTO #OrdersTable (Id, Custid) VALUES (5, 72);
INSERT INTO #OrdersTable (Id, Custid) VALUES (6, 72);
INSERT INTO #OrdersTable (Id, Custid) VALUES (7, 73);
INSERT INTO #OrdersTable (Id, Custid) VALUES (8, 74);
INSERT INTO #OrdersTable (Id, Custid) VALUES (9, 74);
In this case, customers 71 and 72 each have 3 orders. Customer 73 has 1 order. Customer 74 has 2 orders.
Let's say we want to know the customers with the largest number of orders.
The following query:
SELECT TOP (1) WITH TIES Custid
FROM #OrdersTable
GROUP BY Custid
ORDER BY COUNT(*) DESC;
results in the following:
Custid
-----------
71
72
(2 rows affected)
Naive conversion to EF Core
Given the following class:
public class Order
{
public int Id { get; set; }
public int Custid { get; set; }
}
and the following data:
var OrdersTable = new List<Order>()
{
new Order(){ Id = 1, Custid = 71},
new Order(){ Id = 2, Custid = 71},
new Order(){ Id = 3, Custid = 71},
new Order(){ Id = 4, Custid = 72},
new Order(){ Id = 5, Custid = 72},
new Order(){ Id = 6, Custid = 72},
new Order(){ Id = 7, Custid = 73},
new Order(){ Id = 8, Custid = 74},
new Order(){ Id = 9, Custid = 74},
};
here's a naive conversion of the query to EF Core:
var n = OrdersTable.GroupBy(order => order.Custid).Select(grouping => grouping.Count()).Max();
var custids = OrdersTable.GroupBy(order => order.Custid).Where(grouping => grouping.Count() == n).Select(grouping => grouping.Key);
Using the following to display the data:
foreach (var custid in custids)
Console.WriteLine(custid);
we get:
71
72
Question
Let's put T-SQL and EF Core versions side by side. T-SQL:
SELECT TOP (1) WITH TIES Custid
FROM #OrdersTable
GROUP BY Custid
ORDER BY COUNT(*) DESC;
EF Core:
var n = OrdersTable.GroupBy(order => order.Custid).Select(grouping => grouping.Count()).Max();
var custids = OrdersTable.GroupBy(order => order.Custid).Where(grouping => grouping.Count() == n).Select(grouping => grouping.Key);
My question is, is there a more efficient way to implement this query in EF Core?
Full program
The full C# console program that demonstrates the above query:
using System;
using System.Collections.Generic;
using System.Linq;
namespace EfCoreTop1Ties
{
public class Order
{
public int Id { get; set; }
public int Custid { get; set; }
}
internal class Program
{
static void Main(string[] args)
{
var OrdersTable = new List<Order>()
{
new Order(){ Id = 1, Custid = 71},
new Order(){ Id = 2, Custid = 71},
new Order(){ Id = 3, Custid = 71},
new Order(){ Id = 4, Custid = 72},
new Order(){ Id = 5, Custid = 72},
new Order(){ Id = 6, Custid = 72},
new Order(){ Id = 7, Custid = 73},
new Order(){ Id = 8, Custid = 74},
new Order(){ Id = 9, Custid = 74},
};
var n = OrdersTable.GroupBy(order => order.Custid).Select(grouping => grouping.Count()).Max();
var custids = OrdersTable.GroupBy(order => order.Custid).Where(grouping => grouping.Count() == n).Select(grouping => grouping.Key);
foreach (var custid in custids)
Console.WriteLine(custid);
}
}
}
CodePudding user response:
You should be able to do that with one query:
var n = OrdersTable
.GroupBy(order => order.Custid)
.Select(grouping => new { CustomerId = grouping.Key, OrderCount = grouping.Count() })
.OrderByDescending(g => g.OrderCount)
.ToList();
var maxCount = n.First().OrderCount;
var custIds = n.Where(g => g.OrderCount == maxCount)
.Select(g => g.CustomerId)
.ToList();
In the case where OrdersTable was a DbContext DBSet this would result in 1 query to the database. The check for returning items that match that count is done in memory from the results without a return trip.
If you are dealing with a particularly large dataset you can consider some reasonable assumptions such as that if there were thousands of customers whether it is likely that 100 or 1000 would "tie" for the largest # of orders.
var n = OrdersTable
.GroupBy(order => order.Custid)
.Select(grouping => new { CustomerId = grouping.Key, OrderCount = grouping.Count() })
.OrderByDescending(g => g.OrderCount)
.Take(100)
.ToList();
if (n.All(g => g.OrderCount == n.First().OrderCount)
// Redo query with larger threshold.
It would run the query again if all returned rows happened to have the same order count. You may want to handle such a scenario a different way, such as when all customers have 0 orders. (If in the case you have criteria for selecting what customers are to be compared, such as per city/state etc.)
With a very large data table to cover, another option would be just to query the max count, provided your entities were configured with navigation properties so your Customer entity can relate to it's Orders collection:
var maxOrderCount = dbContext.Customers
.OrderByDescending(x => x.Orders.Count)
.Select(x => x.Orders.Count)
.First();
var customers = dbContext.Customers
.Where(x => x.Orders.Count == maxOrderCount)
.ToList();
adding .Select(x => x.CustomerId)
before the ToList()
if you just want the customer ID. It runs two queries, but they are reasonably simple ones and only return the data required rather than potentially all customer IDs/data.