I have two lists of objects which have a common property called OrderNumber.
The first list has about 20000 items and the second list has about 1.5 million items.
I need an efficient way of finding items from list 1 which dont have a match in list 2. I am currently using Linq and it takes more than 20 mins to compute the solution. I am not able to find an efficient solution to this online.
My code so far
notmatched.AddRange(List1.Where(l1=> !list2.Select(l2=> l2.OrderNumber).Contains(l1.OrderNumber)).Select(l1 => new SomeObj
{
OrderNumber = l1.OrderNumber
}));
CodePudding user response:
Using the built in Except extension provided by Linq is fast enough providing a custom IEqualityComparer. My implementation may not work for your use-case but given 1.5 million Poco classes in firstList, and 20k in secondList, it executes under 1 second.
Documentation for IEqualityComparer, Linq Except
DotnetFiddle - reduced numbers to work around memory limitations
// Classes used in test:
public interface IOrderNumber
{
string OrderNumber { get; set; }
}
public class Poco: IOrderNumber
{
public string OrderNumber { get; set; }
}
public class Podo: IOrderNumber
{
public string OrderNumber {get;set;}
}
public class DataEqualityComparer : IEqualityComparer<IOrderNumber>
{
public bool Equals(IOrderNumber p1, IOrderNumber p2)
{
var equal = GetHashCode(p1) == GetHashCode(p2);
return equal;
}
public int GetHashCode(IOrderNumber p1)
{
if (p1 == null)
return -1;
int hCode = p1.OrderNumber.GetHashCode();
return hCode.GetHashCode();
}
}
... then your code would look like this:
var firstList = Enumerable.Range(1, 1500000).Select(x => new Poco { OrderNumber = x.ToString() }).ToList();
var secondList = Enumerable.Range(50, 20000).Select(x => new Podo { OrderNumber = x.ToString() }).ToList();
Stopwatch sw = Stopwatch.StartNew();
var result = firstList.Except(secondList, new DataEqualityComparer()).ToList();
sw.Stop();
Console.WriteLine($"Duration: {sw.Elapsed:G}");
CodePudding user response:
Here is a more simplified version of your solution, it uses less loops to do the job, but I'm not sure if this will make the process faster, Please let me know if it does
notmatched.AddRange(List1.Where(l1=> !list2.Any(l2=> l2.OrderNumber == l1.OrderNumber).Select(l1 => new SomeObj
{
OrderNumber = l1.OrderNumber
}));