Home > other >  Linq query to find the nth highest value in each group
Linq query to find the nth highest value in each group

Time:11-17

I have a group of students with Names and Marks. I want to return a list containing the Name of student and its nth highest marks where n = 3 in my case.

var list = await _studentsDbContext
  .MarksTable
  .GroupBy(i => i.Name)
  .Select(g => new PercentileEntry
      {
          StudentName = g.Key,
          Percentile = g
             .OrderByDescending(g => g.Marks)
             .Take(3)
             .Skip(2)
             .FirstOrDefault()
             .Marks
      })
  .ToListAsync();

I tried writing the above query but it gives exception

System.Collections.Generic.List cannot be used for parameter of type 'System.Linq.IQueryable`.

Is there a way to write a linq query for that?

CodePudding user response:

The problem is that you are reusing an alias in the same scope, I actually expect a different error message in this case, but change the inner g to something else (not already in use) in the orderby statement:
.OrderByDescending(m => m.Marks)

var list = await _studentsDbContext
  .MarksTable
  .GroupBy(i => i.Name)
  .Select(g => new PercentileEntry
      {
          StudentName = g.Key,
          Percentile = g
             .OrderByDescending(m => m.Marks)
             // This doesn't really optimise anything, so leave it out.
             //.Take(3)
             .Skip(2)
             .FirstOrDefault()
             .Marks
      })
  .ToListAsync();

There is no need to Take(3) in this case at all, not unless you were going to change the order or your database has some non-standard query mechanics.

This should prepare the following SQL:

SELECT [m].[Name] AS [StudentName], (
    SELECT [m0].[Marks]
    FROM [MarksTable] AS [m0]
    WHERE ([m].[Name] = [m0].[Name]) OR (([m].[Name] IS NULL) AND ([m0].[Name] IS NULL))
    ORDER BY [m0].[Marks] DESC
    OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY) AS [Percentile]
FROM [MarksTable] AS [m]
GROUP BY [m].[Name]

If you leave the .Take(3) in, the SQL is a bit more complicated, but is logically the similar:

SELECT [m].[Name] AS [StudentName], (
    SELECT [t].[Marks]
    FROM (
        SELECT TOP(3) [m0].[Id], [m0].[Marks], [m0].[Name], [m0].[Subject]
        FROM [MarksTable] AS [m0]
        WHERE ([m].[Name] = [m0].[Name]) OR (([m].[Name] IS NULL) AND ([m0].[Name] IS NULL))
        ORDER BY [m0].[Marks] DESC
    ) AS [t]
    ORDER BY [t].[Marks] DESC
    OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY) AS [Percentile]
FROM [MarksTable] AS [m]
GROUP BY [m].[Name]
  • Both of these queries produce the same execution plans because the TOP(3) is made redundant by OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY

You can explore this in the following fiddle: https://dotnetfiddle.net/1HoVWq


Update: 'GroupByShaperExpression: .OrderByDescending(m => m.marks)' could not be translated

If you experience exceptions like this, then it indicates that there is something else exotic about your PercentileEntry class, could be something as simple as not having a default constructor. I can't be specific about the causes or solutions to this without having more knowledge of the schema, but there are two geralised options to solve this

  1. Use anonymous types to extract the results from the query, then once the filtered and parsed data is in memory, then you can convert it to the required type

     var rawlist = await _studentsDbContext
             .MarksTable
             .GroupBy(i => i.Name)
             .Select(g => new // removed formal type definition 
              {
                  StudentName = g.Key,
                  Percentile = g
                  .OrderByDescending(m => m.Marks)
                  .Skip(2)
                  .FirstOrDefault()
                  .Marks
              })
               .ToListAsync();
    
     // convert to PercentileEntry if you really need to
     var list = rawlist.Select(x => new PercentileEntry
           {
               StudentName = x.StudentName,
               Percentile = x.Percentile 
           }).ToList();
    
  2. As recommended by the Exception message, you can move all, or most of this logic into the client-side, which is almost always guaranteed to work, but is usually the worst advice and if done incorrectly can be incredibly inefficient. Take the following, we can load the entire list of marks into memory, then it's just standard LINQ to objects.

     var list = (await _studentsDbContext.MarksTable.ToListAsync())
       .GroupBy(i => i.Name)
       .Select(g => new PercentileEntry
           {
               StudentName = g.Key,
               Percentile = g
                  .OrderByDescending(m => m.Marks)
                  .Skip(2)
                  .FirstOrDefault()
                  .Marks
           })
       .ToList();
    

A Note about .Take() and .Skip()

If your data is already in memory, so this is not a LINQ-to-Entities or SQL scenario, so if we are already in the client context...

Then the Take in this chain .Take(3).Skip(2).FirstOrDefault() is still redundant. .Skip(2) skips past the first 2 elements, then .FirstOrDefault() terminates the expression, so the .Take(3) never really gets evaluated.

  • .Take() and .Skip() do not run to completion before the next element is returned, not like OrderBy that is effectively fully executed before the records are yielded to the next IEnumerable statement. .Take() and .Skip() are non-blocking processes that evaluate and yield to all the chained IEnumerable operations before the next item is evaluated:

    1. Move Next => First Item
    2. .Take(3) => yield (1 returned)
    3. .Skip(2) => continue; (1 skipped)
    4. Move Next => Second Item
    5. .Take(3) => yield (2 returned)
    6. .Skip(2) => continue; (2 skipped)
    7. Move Next => Third Item
    8. .Take(3) => yield (2 returned)
    9. .Skip(2) => yield (2 have already skipped)
    10. .FirstOrDefault() => return (Breaks on the first record)

In that chain, we never get the the point where .Take() gets to influence the outcome of the expression, If there was another item, then it would break; the current loop, but not fully return like FirstOrDefault() does.

  • Interestingly, if you had used .Single() then it would have had a chance to be evaluated, but the final outcome would be the same. This is because .Single() actually performs a .Take(2) and throws an exception if there is a second record... just a fun fact ;)
  • Related