I'm currently working on a .NET Core 3.1 application.
I need to filter a list using LINQ and optimize the performance as much as possible.
My current code works fine, though I'm wondering about the performance.
The objective:
- A result of SearchResult needs to be filted
- There must be only one distinct Id
- Sometimes I get two items with the same Id i.e. 1, then I need to remove one item
- BUT: If there are several items with the same Id value i.e. 1, then I need to remove the item without ContactId. If neither of the items does have a ContactId, I can select the first occurrence.
- The final result needs to be orderd by Id then by ContactId
public class SearchResult
{
public int? Id {get; set;}
public int? ContactId {get; set;}
}
public class Program
{
public static void Main()
{
var searchResults = new List<SearchResult>
{
new SearchResult { Id = 1 },
new SearchResult { Id = 2 }, // yes
new SearchResult { Id = 3 }, // yes
new SearchResult { Id = 4 }, // yes
new SearchResult { Id = 5 },
new SearchResult { Id = 1, ContactId = 1 }, // yes
new SearchResult { Id = 5, ContactId = 3 }, // yes
new SearchResult { Id = 1, ContactId = 1 },
new SearchResult { Id = 8, ContactId = 4 }, // yes
new SearchResult { Id = 1 },
new SearchResult { Id = 2 },
new SearchResult { Id = 10 }, // yes
new SearchResult { Id = 11 }, // yes
new SearchResult { Id = 12 }, // yes
};
// group1 without contactId
var group1 = searchResults
.Where(sr => sr.ContactId == null)
.GroupBy(sr => sr.Id)
.Select(grp => grp.First());
// group2 WITH contactId
var group2 = searchResults
.Where(sr => sr.ContactId != null)
.GroupBy(sr => sr.Id)
.Select(grp => grp.First());
// joined = group1.Id - group2.Id
var joined = group1.Where(g1 => !group2.Any(g2 => g2.Id == g1.Id));
// result = group2 joined
var merged = new List<SearchResult>();
merged.AddRange(group2);
merged.AddRange(joined);
// result ordered by id then by contactId
var result = merged
.OrderBy(x => x.Id)
.ThenBy(x => x.ContactId);
foreach(var sr in result){
Console.WriteLine(sr.Id " " sr.ContactId);
}
}
}
So far so good - my code "works" but perhaps someone has an idea on how to improve this code and its performance?
CodePudding user response:
a simpler solution would be to sort the data first and then group it based on the id. This should be more performant than joining the data.
var searchResults = new List<SearchResult>
{
new SearchResult { Id = 1 },
new SearchResult { Id = 2 }, // yes
new SearchResult { Id = 3 }, // yes
new SearchResult { Id = 4 }, // yes
new SearchResult { Id = 5 },
new SearchResult { Id = 1, ContactId = 1 }, // yes
new SearchResult { Id = 5, ContactId = 3 }, // yes
new SearchResult { Id = 1, ContactId = 1 },
new SearchResult { Id = 8, ContactId = 4 }, // yes
new SearchResult { Id = 1 },
new SearchResult { Id = 2 },
new SearchResult { Id = 10 }, // yes
new SearchResult { Id = 11 }, // yes
new SearchResult { Id = 12 }, // yes
};
var result = searchResults
.OrderBy(x => x.Id)
.ThenByDescending(x => x.ContactId)
.GroupBy(p => p.Id)
.Select(x => x.First());
foreach(var sr in result){
Console.WriteLine(sr.Id " " sr.ContactId);
}
CodePudding user response:
All you need is this simple and efficient query:
List<SearchResult> resultList = searchResults
.GroupBy(s => s.Id)
.OrderBy(sg => sg.Key)
.Select(sg => sg.OrderBy(s => s.ContactId != null ? 0 : 1).ThenBy(s => s.ContactId).First())
.ToList();
Order after the GroupBy
, so less sorting necessery.