Home > Software design >  TakeWhile, but I want the rest of the input sequence as well
TakeWhile, but I want the rest of the input sequence as well

Time:09-22

I would like something that effectively performs the same as TakeWhile but returns two sequences:

  1. The results of TakeWhile
  2. The rest of the input sequence with 1. removed

I know I could do something like:

var a = input.TakeWhile(...);
var b = input.Skip(a.Count);

But this seems potentially non-optimal depending on the container type. Have I missed some neat way to do this in a single operation?

My end goal is to iterate over a large collection rather than pre-bucket-ing it:

while(data.Count() > 0)
{
    var y = data.First().Year;
    var year = data.TakeWhile(c => c.Year == y);
    data = data.Skip(year.Count());

    Console.WriteLine($"{year.Count()} items in {y}");
}

CodePudding user response:

You could use ToLookup to split the source into two results.

var source = new[] { 1, 3, 5, 2, 4, 6, 7, 8, 9 };
Func<int, bool> criteria = x => x % 2 == 1;
bool stillGood = true;
Func<int, bool> takeWhileCriteria = x =>
  stillGood = stillGood && criteria(x);

var result = source.ToLookup(takeWhileCriteria);
var matches = result[true];
var nonMatches = result[false];

CodePudding user response:

The simplest way to split the sequence in one iteration, and streaming, is to return a tuple of each item and a bool whether it's "in" or not.

public static IEnumerable<(T Entity, bool IsIn)> MarkWhile<T>(this IEnumerable<T> sequence, 
    Func<T,bool> predicate)
{
    var isIn = true;
    using var etor = sequence.GetEnumerator();
    while (etor.MoveNext())
    {
        var current = etor.Current;
        isIn &= predicate(current);
        yield return (current, isIn);
    }
}

This allows you iterate over a large collection without exhausting it and to determine when the condition "flips". But you'll need a foreach loop to do this in one pass.

It would be possible to create a method that only exhausts the "in" part of the sequence and even return its count (we can do anything when returning tuples) and stream the tail of the sequence, but I would settle with a simple foreach. Nothing wrong with that. Also, there may be cases where all items meet the condition while you still only want to return a limited number of items.

CodePudding user response:

You can create something like what you want, but only in very limited circumstances:

public static class IEnumerableExt {
    public static IEnumerable<T> ToIEnumerable<T>(this IEnumerator<T> e) {
        while (e.MoveNext())
            yield return e.Current;
    }

    public static (IEnumerable<T> first, IEnumerable<T> rest) FirstRest<T>(this IEnumerable<T> src, Func<T,bool> InFirstFn) {
        var e = src.GetEnumerator();
        var first = new List<T>();
        while (e.MoveNext() && InFirstFn(e.Current))
            first.Add(e.Current);

        return (first, e.ToIEnumerable());
    }
}

Note that this has to iterate over and buffer first before it can return (what if you tried to enumerate rest before first?) and you can't call Reset on rest and expect anything reasonable. Fixing these issues would involve a lot more code.

I can dimly see in the distance some type of extended LINQ where you pass Actions and Funcs and do something like continuations (the rest of the IEnumerable) to process, but I am not sure it is worth it. Something like:

public static IEnumerable<T> DoWhile<T>(this IEnumerable<T> src, Func<T,bool> whileFn, Action<T> doFn) {
        var e = src.GetEnumerator();
        while (e.MoveNext() && whileFn(e.Current))
            doFn(e.Current);
            
        return e.ToIEnumerable();
    }

while you could use like:

while (data.Any()) {
    var y = data.First().Year;

    var ct = 0;
    data = data.DoWhile(d => d.Year == y, d =>   ct);
    
    Console.WriteLine($"{ct} items in {y}");
}

The best answer is to stop using the IEnumerable<T> automatic enumeration and manually enumerate:

for (var e = data.GetEnumerator(); e.MoveNext();) {
    var y = e.Current.Year;

    var ct = 0;
    while (e.Current.Year == y)
          ct;

    Console.WriteLine($"{ct} items in {y}");
}

Once you are doing manual enumeration, you can handle most any circumstance without losing efficiency to buffering, or delegate calls for your specific needs.

PS: Note that testing data.Count() against 0 is very inefficient, you should always be using data.Any(). Depending on data, data.Count() may never return, or may be very expensive ordata.Any() may lose data.First().

  • Related