Home > Back-end >  Linq filtering a list by a composite key (contains and not contains)
Linq filtering a list by a composite key (contains and not contains)

Time:02-19

I need an understanding of how to filter a list by a composite key with the conditions "contains" and "does not contain".

I have a list

List<Entity>();

where entity is a class

public class Entity
{
    int id;
    string docNum;
    int docVersion;
}

There is a similar second list. I need to get those elements of the list that match the following condition: "Get the elements of the first list, the document numbers of which are the same as the second, and the versions of the documents differ in the same document numbers".

I tried to use Contains() in conjunction with Select() like:

firstList
    .Where(x => secondList.Select(y => y.docNum).Contains(x.docNum))
    .Where(x => !secondList.Select(y => y.docVersion).Contains(x.docVersion))

but I'm afraid it does not take into account the fact that I need to compare document numbers as well.

It seems to me that this should be resolved through GroupBy() or ToDictionary(), but I can't get to the solution.

Please, if possible, show an example of a solution on standard linq elements, because I use NHibernate

CodePudding user response:

If i understood what you want this should do:

var results = list1
    .Where(l1 => list2.Select(l2 => l2.DocNum).Contains(l1.DocNum))
    .GroupJoin(list2, l1 => l1.DocNum, l2 => l2.DocNum, (e, c) => new KeyValuePair<string, IEnumerable<Entity>>(e.DocNum, c.Prepend(e)))
    .ToArray();

or

var results = list1
    .GroupJoin(list2, l1 => l1.DocNum, l2 => l2.DocNum, (e, c) => new KeyValuePair<string, IEnumerable<Entity>>(e.DocNum, c.Prepend(e)))
    .Where(x => x.Value.Count() > 1)
    .ToArray();

Then you can further select to get the entities you need from the resulting groups, maybe to get the one with the biggest version, or all of them...

Unfortunately I've never used NHibernate so I'm not really sure if this is supported, maybe you'll need to first select the two lists from the database, enumerate them, and then do this on the software side...

CodePudding user response:

Based on your requirement this might be easier to do using the LINQ Query syntax, the way you did and the solution I am posting are still LINQ just different ways of doing it.

List<Entity> firstList = new List<Entity>
        {
            new Entity
            {
                id = 1,
                docNum = "two",
                docVersion = 3
            },
             new Entity
            {
                id = 2,
                docNum = "four",
                docVersion = 4
            },
              new Entity
            {
                id = 3,
                docNum = "ten",
                docVersion = 6
            }
        };
        List<Entity> ListTwo = new List<Entity>
        {
            new Entity
            {
                id = 1,
                docNum = "two",
                docVersion = 5
            },
             new Entity
            {
                id = 2,
                docNum = "eight",
                docVersion = 7
            },
              new Entity
            {
                id = 3,
                docNum = "ten",
                docVersion = 5
            }
        };
        var resultList = from list in firstList
                         from list2 in ListTwo
                         where list.docNum == list2.docNum
                         where list.docVersion != list2.docVersion
                         select list;
  1. First, we created the two lists on which our resulting list will be created from. These are firstList and ListTwo

  2. Then declared a var and assigned to it the result of our LINQ query

  3. from list in firstList from list2 in ListTwo is used to select our two data stores, and we give a name to refer to the individual element in those data stores list and list2

  4. where list.docNum == list2.docNum where list.docVersion != list2.docVersion is used to satisfy our conditions. The first where clause checks to see if the docNum in list and list2 match. Then, the second where clause checks to make sure that the docVersion in list1 and `list2``` are not equal.

  5. select list is used to select the item that matches our where clauses. EDIT if you need the Entity that matched from both sources change select list to select new {list,list2}

If you were to debug this code, you will see thatresultList has two values (see image). enter image description here

CodePudding user response:

You can do more complex comparisons using Any() rather than Select().Contains(). For instance, filtering if the composite key exists in secondList goes like this:

.Where(x => !secondList.Any(y => y.docNum == x.docNum && y.docVersion == x.docVersion)

This is slightly more efficient in this case than Select().Contains() since there's no intermediate object construction. In some cases where the intermediate object is more complex the savings go up.

I'd recommend that you use Any() for your first condition as well, giving you something like:

firstList
.Where(x => x.Any(y => y.docNum == x.docNum))
.Where(x => !secondList.Any(y => y.docNum == x.docNum && y.docVersion == x.docVersion)

The first Where() will filter out any items that don't have a matching docNum in the list, the second will filter the remaining items to remove those that have a full composite match on the second list.

Of course this is terribly inefficient since you have to scan secondList twice - partially, but still two scans. It would be better to do a lookup for version numbers keyed on document number and use that to do much more efficient filtering:

var lu = secondList.ToLookup(y => y.docNum, y => y.docVersion);

firstList.Where
(
    x => 
    { 
        var l = lu[x.docNum]; 
        return l.Any() && !l.Any(v => v == x.docVersion);
    }
)

Since ILookup<> is reasonably well optimized the actual time taken to do the filter is greatly reduced. Using some very simple (mostly-)random list generation with reasonably high collisions I tested filtering 10,000 items against a list of 1,000,000 items to see what the difference would be. 49 seconds for the first option, 1.8 seconds with a lookup.

That said, this is really only going to work well on IEnumerable<>. If you're operating on an IQueryable<> then stick with Any() and good indices.

  • Related