Home > Net >  Group elements of the data set if they are next to each other with LINQ
Group elements of the data set if they are next to each other with LINQ

Time:05-14

I have a data set (ex. 1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7) and I want to group items of the same value but only if they are next to each other minimum 3 times.

Is there a way? I've tried using combinations of Count and GroupBy and Select in every way I know but I can't find a right one.

Or if it can't be done with LINQ then maybe some other way?

CodePudding user response:

I don't think I'd strive for a 100% LINQ solution for this:

var r = new List<List<int>>() { new () { source.First() } };

foreach(var e in source.Skip(1)){
  if(e == r.Last().Last()) r.Last().Add(e);
  else r.Add(new(){ e });
}

return r.Where(l => l.Count > 2);

The .Last() calls can be replaced with [^1] if you like

Output like:

[
  [2,2,2],
  [6,6,6]
]

Aggregate can be pushed into doing the same thing; this is simply an accumulator (r), an iteration (foreach) and an op on the result Where

var result = source.Skip(1).Aggregate(
    new List<List<int>>() { new List<int> { source.First() } }, 
    (r,e) => {
      if(e == r.Last().Last()) r.Last().Add(e);
      else r.Add(new List<int>(){ e });
      return r;
    },
    r => r.Where(l => l.Count > 2)
);

..but would you want to be the one to explain it to the new dev?


Another LINQy way would be to establish a counter that incremented by one each time the value in the source array changes compared to the pervious version, then group by this integer, and return only those groups 3 , but I don't like this so much because it's a bit "WTF"

var source = new[]{1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7};
int ctr = 0;
var result = source.Select(
  (e,i) => new[]{ i==0 || e != source[i-1] ?   ctr : ctr, e}
)
.GroupBy(
  arr => arr[0], 
  arr => arr[1]
)
.Where(g => g.Count() > 2);

CodePudding user response:

If you're nostalgic and like stuff like the Obfuscated C code contest, you could solve it like this.
(No best practice claims included)

        int[] n = {1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7};
        var t = new int [n.Length][];
        for (var i = 0; i < n.Length; i  )
            t[i] = new []{n[i], i == 0 ? 0 : n[i] == n[i - 1] ? t[i - 1][1] : t[i - 1][1]   1};

        var r = t.GroupBy(x => x[1], x => x[0])
                 .Where(g => g.Count() > 2)
                 .SelectMany(g => g);

        Console.WriteLine(string.Join(", ", r));

In the end Linq is likely not the best solution here. A simple for-loop with 1,2,3 additional loop-variables to track the "group index" and the last value makes likely more sense. Even if it's 2 lines more code written.

CodePudding user response:

I wouldn't use Linq just to use Linq.

I'd rather suggest using a simple for loop to loop over your input array and populate the output list. To keep track of which number is currently being repeated (if any), I'd use a variable (repeatedNumber) that's initially set to null.

In this approach, a number can only be assigned to repeatedNumber if it fulfills the minimum requirement of repeated items. Hence, for your example input, repeatedNumber would start at null, then eventually be set to 2, then be set to 6, and then be reset to null.

One perhaps good use of Linq here is to check if the minimum requirement of repeated items is fulfilled for a given item in input, by checking the necessary consecutive items in input:

input
    .Skip(items up to and including current item)
    .Take(minimum requirement of repeated items - 1)
    .All(equal to current item)

I'll name this minimum requirement of repeated items repetitionRequirement. (In your question post, repetitionRequirement is 3.)

The logic in the for loop goes a follows:

  • number = input[i]
  • If number is equal to repeatedNumber, it means that the previously repeated item continues being repeated
    • Add number to output
  • Otherwise, if the minimum requirement of repeated items is fulfilled for number (i.e. if the repetitionRequirement - 1 items directly following number in input are all equal to number), it means that number is the first instance of a new repeated item
    • Set repeatedNumber equal to number
    • Add number to output
  • Otherwise, if repeatedNumber has value, it means that the previously repeated item just ended its repetition
    • Set repeatedNumber to null

Here is a suggested implementation:
(I'd suggest finding a more descriptive method name)

//using System.Collections.Generic;
//using System.Linq;

public static List<int> GetOutput(int[] input, int repetitionRequirement)
{
    var consecutiveCount = repetitionRequirement - 1;
    
    var output = new List<int>();
    
    int? repeatedNumber = null;
            
    for (var i = 0; i < input.Length; i  )
    {
        var number = input[i];
        
        if (number == repeatedNumber)
        {
            output.Add(number);
        }
        else if (i   consecutiveCount < input.Length &&
            input.Skip(i   1).Take(consecutiveCount).All(num => num == number))
        {
            repeatedNumber = number;
            output.Add(number);
        }
        else if (repeatedNumber.HasValue)
        {
            repeatedNumber = null;
        }
    }
    
    return output;
}

By calling it with your example input:

var dataSet = new[] { 1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7 };

var output = GetOutput(dataSet, 3);

you get the following output:

{ 2, 2, 2, 6, 6, 6 }

Example fiddle here.

  • Related