Home > Software engineering >  Why doesn't LINQ implement ICollection<T> (or other interfaces) when possible?
Why doesn't LINQ implement ICollection<T> (or other interfaces) when possible?

Time:11-02

Consider this example:

List<int> GetLengths(ICollection<string> strings) =>
    strings
        .Select(s => s.Length)
        .ToList();

If Select() checked that the input collection implements ICollection<T> and that its length is known in advance, it could have returned ICollection<T> too, so that ToList() would initialize the resulting list with capacity. Instead, the list adds each value one by one, expanding its storage log(N) times.

Is there any reason it's not done so in LINQ?

Update: since there are many questions about my suggestion, here are some justifications for my concept:

  1. LINQ already returns many different implementation of IEnumerable<T>. IMO there is nothing wrong with adding another iterator that implements one more interface.
  2. A read-only ICollection<T> doesn't have to be materialized in memory, it only has to have Count. Here is an example of a bare implementation of ICollection<T> that behaves similar to Enumerable.Repeat<T>() except it calls a delegate to generate each element. It throws exceptions left and right, but so the stock ReadOnlyCollection<T> does.
  3. The List<T>(IEnumerable<T> collection) constructor already checks if collection also implements ICollection<T>, so that the list can allocate its storage in advance. It doesn't violate any interfaces or conventions.
  4. From the architectural point of view, implementing IReadOnlyCollection<T> would make more sense, but unfortunately, it's often neglected in the BCL itself, and the List<T> constructor doesn't check it.

CodePudding user response:

Optimizing the source.Select().ToList(), as well as the source.Select().ToDictionary(), was discussed a few years ago (2015) in this GitHub issue: System.Linq performance improvement suggestions

The suggestions were reviewed, and the decision was to not implement the optimizations. Here is an excerpt from the last post in the thread, by Stephen Toub, Sep 15, 2015.

Through this thread as well as a bunch of PRs that have happened in the interim, we did agree that we don't want to expose existing collection interfaces out of LINQ, e.g. the object returned by Select shouldn't implement ICollection<T>, but we are ok with and have introduced additional internal interfaces to help curry certain information through the built-in operators, e.g. the iterator implemented by Select implements IArrayProvider so that its ToArray implementation can be optimized to work on the original array with complete knowledge of its size, using its indexer rather than going through IEnumerable, etc.

At this point, given the original tasks called out on this thread, it looks like the only concrete remaining items would be around specializing methods like ToDictionary to have more knowledge of the original source. I suggest that if someone wants to implement such optimizations and do detailed measurements, such optimizations with associated measurements could be submitted as individual PRs, and an issue like this isn't needed to track them. So I'm going to leave this closed and not open new issues. Obviously folks should feel free to open subsequent issues / PRs for individual topics to be discussed / solutions implemented.

IMHO these internal LINQ optimizations are very much desirable.


Update: Apparently the source.Select().ToList() and source.Select().ToArray() operations have been optimized at a later stage of .NET's evolution, as can be demonstrated experimentally with this demo:

byte[] source = new byte[1_000_000];
long mem0 = GC.GetTotalAllocatedBytes(true);
List<byte> result = source.Select(n => (byte)1).ToList();
long mem1 = GC.GetTotalAllocatedBytes(true);
Console.WriteLine($"Allocated: {mem1 - mem0:#,0} bytes");

Output on .NET 6, Release mode:

Allocated: 1,000,232 bytes

Online demo

So the byte array was projected and then materialized to a List<byte> in one go, without log(N) allocations and expansions.

CodePudding user response:

If Select() checked that the input collection implements ICollection<T> and that its length is known in advance, it could have returned ICollection<T> too

Whilst it could, that would mean Select would have to immediately return a materialised collection.

Select returns an iterator, which can be passed through other LINQ methods (e.g. Where), whilst still requiring only a single iteration of the source.

Returning a collection at each stage would likely be less efficient in many queries.

  • Related