Home > Back-end >  Why is the Enumerable.Any(Func<TSource, bool> predicate) slow compared to a foreach with an if
Why is the Enumerable.Any(Func<TSource, bool> predicate) slow compared to a foreach with an if

Time:11-23

Something has piqued my curiosity recently..

Why is the Enumerable.Any(Func<TSource, bool> predicate) method so much slower than manual foreach, when they do the same thing?

I've been messing with some benchmarks and thought of this. I'm checking of a List<int> contains and item that's approximately in the half of the list.

Here are my test results for a few diffent sizes of the list:

Items: 1 000, searched item: 543

Method Mean Ratio Allocated Alloc Ratio
Foreach 838.3 ns 1.00 - NA
Any 3,348.8 ns 4.05 40 B NA

Items: 10 000, searched item: 5 432

Method Mean Ratio Allocated Alloc Ratio
Foreach 7.988 us 1.00 - NA
Any 30.991 us 3.88 40 B NA

Items: 100 000, searched item: 54 321

Method Mean Ratio Allocated Alloc Ratio
Foreach 82.35 us 1.00 - NA
Any 328.86 us 4.00 40 B NA

There are two benchmarks:

  • Foreach: manual foreach with an if statement
  • Any: LINQ's Any method (that turns into Enumerable.Any)

Here's my code for the benchmarks (using BenchmarkDotNet, .NET 6.0 console app running in Release mode):

[MemoryDiagnoser(displayGenColumns: false)]
[HideColumns("Error", "StdDev", "RatioSD")]
public class Benchmarks
{
    private readonly List<int> _items;
    private readonly Func<int, bool> _filter;

    public Benchmarks()
    {
        _items = Enumerable.Range(1, 10_000).ToList();
        _filter = x => x == 5432;
    }

    [Benchmark(Baseline = true)]
    public bool Foreach()
    {
        if (_items is null)
        {
            throw new ArgumentNullException(nameof(_items));
        }

        if (_filter is null)
        {
            throw new ArgumentNullException(nameof(_filter));
        }

        foreach (var item in _items)
        {
            if (_filter(item))
            {
                return true;
            }
        }

        return false;
    }

    [Benchmark]
    public bool Any()
    {
        return _items.Any(_filter);
    }
}

The Any approach is 4 times slower and allocates a bit of memory despite my best attempts to optimize it.

I tried to make the Any approach faster by caching the predicate (Func<int, bool>) in a variable (_filter). However, it still allocates 40B and I have no idea why...

When decompiled, the Any approach turns into Enumerable.Any(Func<TSource, bool> predicate) method:

public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
    }

    if (predicate == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.predicate);
    }

    foreach (TSource element in source)
    {
        if (predicate(element))
        {
            return true;
        }
    }

    return false;
}

How is the Any approach different from the Foreach approach? Just curious...

EDIT

As Jon Skeet suggested, I tried changing the _items collection from a List<int> to IEnumerable<int> to make the comparison fair. In short, that seems to be the key difference. My Foreach seems to be taking advantage of that fact that it know the _items collection is a List<T> while the Enumerable.Any method takes an IEnumerable<int>.

Here are the benchmark results for that:

Items: 1 000, searched item: 543

Method Mean Ratio Allocated Alloc Ratio
Foreach 2.126 us 1.00 40 B 1.00
Any 2.131 us 1.00 40 B 1.00

Items: 10 000, searched item: 5 432

Method Mean Ratio Allocated Alloc Ratio
Foreach 21.35 us 1.00 40 B 1.00
Any 21.20 us 0.99 40 B 1.00

Items: 100 000, searched item: 54 321

Method Mean Ratio Allocated Alloc Ratio
Foreach 220.7 us 1.00 40 B 1.00
Any 219.1 us 0.99 40 B 1.00

When working with IEnumerable<int>, the two approaches are performing the same. Thanks Jon Skeet!

CodePudding user response:

Compiler can optimize working with the List<T> inside foreach (compared to IEnumerable<T>). I would not be able to explain in the details, but if you check the generated IL (for example at sharplab.io) you already will see the differences - compiler can call concrete methods on List<T>.Enumerator instead polymorphic invocation via callvirt (Call and Callvirt). Not sure that this (and one time allocation due to working with struct List<T>.Enumerator via interface) results in such performance difference. Possibly runtime can optimize it even more (check out the JIT Asm difference at sharplab.io if you want to try going deeper).

If you check the source code for Enumerable.Any you will see that it uses the same foreach loop and difference boils down to using IEnumerable interface:

public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
    }
 
    if (predicate == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.predicate);
    }
 
    foreach (TSource element in source)
    {
        if (predicate(element))
        {
            return true;
        }
    }
 
    return false;
}

So, as correctly diagnosed by @Jon Skeet in the comments, the difference comes from using list vs enumerable.

  • Related