Home > Enterprise >  Split a list of objects into sub-lists of contiguous elements using LINQ?
Split a list of objects into sub-lists of contiguous elements using LINQ?

Time:09-17

I have a simple class Item:

public class Item
{
  public int Start { get; set;}
  public int Stop { get; set;}
}

Given a List<Item> I want to split this into multiple sublists of contiguous elements. e.g. a method

List<Item[]> GetContiguousSequences(Item[] items)

Each element of the returned list should be an array of Item such that list[i].Stop == list[i 1].Start for each element

e.g.

{[1,10], [10,11], [11,20], [25,30], [31,40], [40,45], [45,100]} 

=>

{{[1,10], [10,11], [11,20]}, {[25,30]}, {[31,40],[40,45],[45,100]}}

Here is a simple (and not guaranteed bug-free) implementation that simply walks the input data looking for discontinuities:

List<Item[]> GetContiguousSequences(Item []items)
{
        var ret = new List<Item[]>();

        var i1 = 0;
        for(var i2=1;i2<items.Length;  i2)
        {
            //discontinuity
            if(items[i2-1].Stop != items[i2].Start)
            {
                var num = i2 - i1;
                ret.Add(items.Skip(i1).Take(num).ToArray());
                i1 = i2;
            }
        }
        //end of array
        ret.Add(items.Skip(i1).Take(items.Length-i1).ToArray());

        return ret;
}

It's not the most intuitive implementation and I wonder if there is a way to have a neater LINQ-based approach. I was looking at Take and TakeWhile thinking to find the indices where discontinuities occur but couldn't see an easy way to do this.

Is there a simple way to use IEnumerable LINQ algorithms to do this in a more descriptive (not necessarily performant) way?

I set of a simple test-case here: https://dotnetfiddle.net/wrIa2J

CodePudding user response:

I'm really not sure this is much better than your original, but for the purpose of another solution the general process is

  1. Use Select to project a list working out a grouping
  2. Use GroupBy to group by the above
  3. Use Select again to project the grouped items to an array of Item
  4. Use ToList to project the result to a list

public static List<Item[]> GetContiguousSequences2(Item []items)
{
    var currIdx = 1;
    return items.Select( (item,index) =>  new {
            item = item,
            index = index == 0 || items[index-1].Stop == item.Start ? currIdx :   currIdx
        })
        .GroupBy(x => x.index, x => x.item)
        .Select(x => x.ToArray())
        .ToList();      
}

Live example: https://dotnetfiddle.net/mBfHru


Another way is to do an aggregation using Aggregate. This means maintaining a final Result list and a Curr list where you can aggregate your sequences, adding them to the Result list as you find discontinuities. This method looks a little closer to your original

public static List<Item[]> GetContiguousSequences3(Item []items)
{
    var res = items.Aggregate(new {Result = new List<Item[]>(), Curr = new List<Item>()}, (agg, item) => {
            if(!agg.Curr.Any() || agg.Curr.Last().Stop == item.Start) {
                agg.Curr.Add(item);
            } else {
                agg.Result.Add(agg.Curr.ToArray());
                agg.Curr.Clear();   
                agg.Curr.Add(item);
            }
        return agg;
    });     
    res.Result.Add(res.Curr.ToArray()); // Remember to add the last group
    return res.Result;
}

Live example: https://dotnetfiddle.net/HL0VyJ

CodePudding user response:

You can implement ContiguousSplit as a corutine: let's loop over source and either add item into current range or return it and start a new one.

  private static IEnumerable<Item[]> ContiguousSplit(IEnumerable<Item> source) {
    List<Item> current = new List<Item>();

    foreach (var item in source) {
      if (current.Count > 0 && current[current.Count - 1].Stop != item.Start) {
        yield return current.ToArray();

        current.Clear();
      }

      current.Add(item);
    }  

    if (current.Count > 0)
      yield return current.ToArray();
  }

then if you want materialization

List<Item[]> GetContiguousSequences(Item []items) => ContiguousSplit(items).ToList();

CodePudding user response:

Your solution is okay. I don't think that LINQ adds any simplification or clarity in this situation. Here is a fast solution that I find intuitive:

static List<Item[]> GetContiguousSequences(Item[] items)
{
    var result = new List<Item[]>();
    int start = 0;
    while (start < items.Length) {
        int end = start   1;
        while (end < items.Length && items[end].Start == items[end - 1].Stop) {
            end  ;
        }

        int len = end - start;
        var a = new Item[len];
        Array.Copy(items, start, a, 0, len);
        result.Add(a);
        start = end;
    }
    return result;
}
  • Related