LINQ JOIN performance optimization-CodePudding

I'm currently working on a .NET Core 3.1 application.

I need to filter a list using LINQ and optimize the performance as much as possible.

My current code works fine, though I'm wondering about the performance.

The objective:

A result of SearchResult needs to be filted
There must be only one distinct Id
Sometimes I get two items with the same Id i.e. 1, then I need to remove one item
BUT: If there are several items with the same Id value i.e. 1, then I need to remove the item without ContactId. If neither of the items does have a ContactId, I can select the first occurrence.
The final result needs to be orderd by Id then by ContactId

public class SearchResult
{
    public int? Id {get; set;}
    public int? ContactId {get; set;}
}

public class Program
{
    public static void Main()
    {
        var searchResults = new List<SearchResult>
        {
            new SearchResult { Id = 1 },
            new SearchResult { Id = 2 }, // yes
            new SearchResult { Id = 3 }, // yes
            new SearchResult { Id = 4 }, // yes
            new SearchResult { Id = 5 },
            
            new SearchResult { Id = 1, ContactId = 1 }, // yes
            new SearchResult { Id = 5, ContactId = 3 }, // yes
            
            new SearchResult { Id = 1, ContactId = 1 },
            new SearchResult { Id = 8, ContactId = 4 }, // yes
            
            new SearchResult { Id = 1 }, 
            new SearchResult { Id = 2 }, 
            new SearchResult { Id = 10 }, // yes
            new SearchResult { Id = 11 }, // yes
            new SearchResult { Id = 12 }, // yes
        };
        
        // group1 without contactId
        var group1 = searchResults
                        .Where(sr => sr.ContactId == null)
                        .GroupBy(sr => sr.Id)
                        .Select(grp => grp.First());
        
        // group2 WITH contactId
        var group2 = searchResults
                        .Where(sr => sr.ContactId != null)
                        .GroupBy(sr => sr.Id)
                        .Select(grp => grp.First());
        
        // joined = group1.Id - group2.Id
        var joined = group1.Where(g1 => !group2.Any(g2 => g2.Id == g1.Id));
        
        // result = group2   joined
        var merged = new List<SearchResult>();
        merged.AddRange(group2);
        merged.AddRange(joined);
        
        // result ordered by id then by contactId
        var result = merged
            .OrderBy(x => x.Id)
            .ThenBy(x => x.ContactId);
        
        foreach(var sr in result){         
            Console.WriteLine(sr.Id   " "   sr.ContactId);
        }       
    }
}

So far so good - my code "works" but perhaps someone has an idea on how to improve this code and its performance?

CodePudding user response：

a simpler solution would be to sort the data first and then group it based on the id. This should be more performant than joining the data.

    var searchResults = new List<SearchResult>
    {
        new SearchResult { Id = 1 },
        new SearchResult { Id = 2 }, // yes
        new SearchResult { Id = 3 }, // yes
        new SearchResult { Id = 4 }, // yes
        new SearchResult { Id = 5 },
        
        new SearchResult { Id = 1, ContactId = 1 }, // yes
        new SearchResult { Id = 5, ContactId = 3 }, // yes
        
        new SearchResult { Id = 1, ContactId = 1 },
        new SearchResult { Id = 8, ContactId = 4 }, // yes
        
        new SearchResult { Id = 1 }, 
        new SearchResult { Id = 2 }, 
        new SearchResult { Id = 10 }, // yes
        new SearchResult { Id = 11 }, // yes
        new SearchResult { Id = 12 }, // yes
    };
    
    var result = searchResults
                 .OrderBy(x => x.Id)
                 .ThenByDescending(x => x.ContactId)
                 .GroupBy(p => p.Id)
                 .Select(x => x.First());
    
    foreach(var sr in result){         
        Console.WriteLine(sr.Id   " "   sr.ContactId);
    }

CodePudding user response：

All you need is this simple and efficient query:

List<SearchResult> resultList = searchResults
    .GroupBy(s => s.Id)
    .OrderBy(sg => sg.Key)
    .Select(sg => sg.OrderBy(s => s.ContactId != null ? 0 : 1).ThenBy(s => s.ContactId).First())
    .ToList();

Order after the GroupBy, so less sorting necessery.