Home > other >  Group by NOT having element in common?
Group by NOT having element in common?

Time:03-26

I have a class such as:

public class SomeClass {
    public HashSet<int> Items;
}

I want to group SomeClass with other SomeClass that DON'T share any Items in common.
Example: Assume we have 3 SomeClass (S1-S4) such as:
S1:
• Items: 1, 2, 6
S2:
• Items: 3, 4, 5
S3:
• Items: 1
S4:
• Items: 1, 5
S5:
• Items: 8, 9

In that case, the groups should be as follows:
G1: S1, S2, S5
G2: S3
G3: S4

Ideally, items should be placed in the group with the least members so all groups can be relatively the same size.

How can this be achieved?

CodePudding user response:

There's no way to define this grouping in Linq. GroupBy in Linq requires a definition of "equality", and that definition must be transitive, meaning that if A=B and B=C, then A=C must be true. Your definition is not transitive, as in your inputs, S1=S2, and S2=S3, but S1!=S3.

What you could do is loop through all items, creating new groups as needed, and adding items to groups to keep the sizes low as you state.

So you could start with S1, look for other items that contain 1, and put them in new groups (S3 and S4 in this case). Then look for items that contain 2 put them in the groups created in step 1, alternating groups.

By that logic you should end up with:

G1: S1, S2
G2: S3, S5
G3: S4

But, again, that can't be done in Linq - you'll have to code the loop and grouping logic yourself.

CodePudding user response:

Answer is strictly for entertainment purposes, provides a single statement conversion that may be even considered LINQ you count "Aggregate" as LINQ (LINQPad-ready):

void Main()
{
    var list = new[] {
        new SomeClass("s1", 1,2,6),
        new SomeClass("s2", 3,4,5), 
        new SomeClass("s3", 1),
        new SomeClass("s4", 1,5), 
        new SomeClass ("s5", 8,9)
    };
    
    var r = list.Aggregate(new List<IEnumerable<SomeClass>>().AsEnumerable(),
        (result, cur) => result
            // groups that should be copied
            .Where(w => w != result.FirstOrDefault(
                  x => x.SelectMany(c => c.Items).Intersect(cur.Items).Count() == 0))
            .Concat(Enumerable.Repeat(
                // adding to existing one
                (result.FirstOrDefault(
                   x => x.SelectMany(c => c.Items).Intersect(cur.Items).Count() == 0) ??
                // adding to new one
                new List<SomeClass>())
                    // add current item to existing/new group
                    .Concat(Enumerable.Repeat(cur, 1)),
                1))
            );
                                 
    r.Dump();
}

// Define other methods and classes here
public class SomeClass
{
    public SomeClass(string name, params int[] items)
    {
        Name = name;
        Items= new HashSet<int>(items);
    }
    public string Name;
    public HashSet<int> Items;
}

Sort-of explanation:

on every iteration

  • split list of groups into two half - first suitable group to add current item and the rest using the same condition.
  • for groups in the rest (.Where selects all but the potential first suitable group) just let them move to next iteration untouched
  • for the suitable group there are two options - we actually have a group to add to or there is no such group and we need to create new one. To unify code we just use result.FirstOrDefault(...) ?? new List<...>() - this gives a list to add new item which is either have items from previous iterations or new one.
  • add current item to the suitable group - .Concat requires IEnumerable which we construct with Enumerable.Repeat(cur,1).
  • concatenate "list of groups to copy" with "list of single group with current element".

Notes if you try to hack up the same code yourself:

  • it is very easy to add "current" group to multiple existing groups if you just check "no intersection". We must only consider first suitable group to get code working correctly
  • subsequently it is easy to lose existing if you try to use some sort of "intersects" filter that doesn't stop on first match.
  • Think about top level Concat as "split list into two parts based on true/false" and merge modified results.
  • Creating new group is easy to get wrong/miss altogether.
  • Related