I need an understanding of how to filter a list by a composite key with the conditions "contains" and "does not contain".
I have a list
List<Entity>();
where entity is a class
public class Entity
{
int id;
string docNum;
int docVersion;
}
There is a similar second list. I need to get those elements of the list that match the following condition: "Get the elements of the first list, the document numbers of which are the same as the second, and the versions of the documents differ in the same document numbers".
I tried to use Contains()
in conjunction with Select()
like:
firstList
.Where(x => secondList.Select(y => y.docNum).Contains(x.docNum))
.Where(x => !secondList.Select(y => y.docVersion).Contains(x.docVersion))
but I'm afraid it does not take into account the fact that I need to compare document numbers as well.
It seems to me that this should be resolved through GroupBy()
or ToDictionary()
, but I can't get to the solution.
Please, if possible, show an example of a solution on standard linq elements, because I use NHibernate
CodePudding user response:
If i understood what you want this should do:
var results = list1
.Where(l1 => list2.Select(l2 => l2.DocNum).Contains(l1.DocNum))
.GroupJoin(list2, l1 => l1.DocNum, l2 => l2.DocNum, (e, c) => new KeyValuePair<string, IEnumerable<Entity>>(e.DocNum, c.Prepend(e)))
.ToArray();
or
var results = list1
.GroupJoin(list2, l1 => l1.DocNum, l2 => l2.DocNum, (e, c) => new KeyValuePair<string, IEnumerable<Entity>>(e.DocNum, c.Prepend(e)))
.Where(x => x.Value.Count() > 1)
.ToArray();
Then you can further select to get the entities you need from the resulting groups, maybe to get the one with the biggest version, or all of them...
Unfortunately I've never used NHibernate so I'm not really sure if this is supported, maybe you'll need to first select the two lists from the database, enumerate them, and then do this on the software side...
CodePudding user response:
Based on your requirement this might be easier to do using the LINQ Query syntax, the way you did and the solution I am posting are still LINQ just different ways of doing it.
List<Entity> firstList = new List<Entity>
{
new Entity
{
id = 1,
docNum = "two",
docVersion = 3
},
new Entity
{
id = 2,
docNum = "four",
docVersion = 4
},
new Entity
{
id = 3,
docNum = "ten",
docVersion = 6
}
};
List<Entity> ListTwo = new List<Entity>
{
new Entity
{
id = 1,
docNum = "two",
docVersion = 5
},
new Entity
{
id = 2,
docNum = "eight",
docVersion = 7
},
new Entity
{
id = 3,
docNum = "ten",
docVersion = 5
}
};
var resultList = from list in firstList
from list2 in ListTwo
where list.docNum == list2.docNum
where list.docVersion != list2.docVersion
select list;
First, we created the two lists on which our resulting list will be created from. These are
firstList
andListTwo
Then declared a
var
and assigned to it the result of our LINQ queryfrom list in firstList from list2 in ListTwo
is used to select our two data stores, and we give a name to refer to the individual element in those data storeslist
andlist2
where list.docNum == list2.docNum where list.docVersion != list2.docVersion
is used to satisfy our conditions. The first where clause checks to see if thedocNum
inlist
andlist2
match. Then, the second where clause checks to make sure that thedocVersion
inlist1
and `list2``` are not equal.select list
is used to select the item that matches our where clauses. EDIT if you need the Entity that matched from both sources changeselect list
toselect new {list,list2}
If you were to debug this code, you will see thatresultList
has two values (see image).
CodePudding user response:
You can do more complex comparisons using Any()
rather than Select().Contains()
. For instance, filtering if the composite key exists in secondList
goes like this:
.Where(x => !secondList.Any(y => y.docNum == x.docNum && y.docVersion == x.docVersion)
This is slightly more efficient in this case than Select().Contains()
since there's no intermediate object construction. In some cases where the intermediate object is more complex the savings go up.
I'd recommend that you use Any()
for your first condition as well, giving you something like:
firstList
.Where(x => x.Any(y => y.docNum == x.docNum))
.Where(x => !secondList.Any(y => y.docNum == x.docNum && y.docVersion == x.docVersion)
The first Where()
will filter out any items that don't have a matching docNum
in the list, the second will filter the remaining items to remove those that have a full composite match on the second list.
Of course this is terribly inefficient since you have to scan secondList
twice - partially, but still two scans. It would be better to do a lookup for version numbers keyed on document number and use that to do much more efficient filtering:
var lu = secondList.ToLookup(y => y.docNum, y => y.docVersion);
firstList.Where
(
x =>
{
var l = lu[x.docNum];
return l.Any() && !l.Any(v => v == x.docVersion);
}
)
Since ILookup<>
is reasonably well optimized the actual time taken to do the filter is greatly reduced. Using some very simple (mostly-)random list generation with reasonably high collisions I tested filtering 10,000 items against a list of 1,000,000 items to see what the difference would be. 49 seconds for the first option, 1.8 seconds with a lookup.
That said, this is really only going to work well on IEnumerable<>
. If you're operating on an IQueryable<>
then stick with Any()
and good indices.