Something has piqued my curiosity recently..
Why is the Enumerable.Any(Func<TSource, bool> predicate)
method so much slower than manual foreach, when they do the same thing?
I've been messing with some benchmarks and thought of this. I'm checking of a List<int>
contains and item that's approximately in the half of the list.
Here are my test results for a few diffent sizes of the list:
Items: 1 000, searched item: 543
Method | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|
Foreach | 838.3 ns | 1.00 | - | NA |
Any | 3,348.8 ns | 4.05 | 40 B | NA |
Items: 10 000, searched item: 5 432
Method | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|
Foreach | 7.988 us | 1.00 | - | NA |
Any | 30.991 us | 3.88 | 40 B | NA |
Items: 100 000, searched item: 54 321
Method | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|
Foreach | 82.35 us | 1.00 | - | NA |
Any | 328.86 us | 4.00 | 40 B | NA |
There are two benchmarks:
- Foreach: manual
foreach
with anif
statement - Any: LINQ's
Any
method (that turns intoEnumerable.Any
)
Here's my code for the benchmarks (using BenchmarkDotNet, .NET 6.0 console app running in Release mode):
[MemoryDiagnoser(displayGenColumns: false)]
[HideColumns("Error", "StdDev", "RatioSD")]
public class Benchmarks
{
private readonly List<int> _items;
private readonly Func<int, bool> _filter;
public Benchmarks()
{
_items = Enumerable.Range(1, 10_000).ToList();
_filter = x => x == 5432;
}
[Benchmark(Baseline = true)]
public bool Foreach()
{
if (_items is null)
{
throw new ArgumentNullException(nameof(_items));
}
if (_filter is null)
{
throw new ArgumentNullException(nameof(_filter));
}
foreach (var item in _items)
{
if (_filter(item))
{
return true;
}
}
return false;
}
[Benchmark]
public bool Any()
{
return _items.Any(_filter);
}
}
The Any approach is 4 times slower and allocates a bit of memory despite my best attempts to optimize it.
I tried to make the Any approach faster by caching the predicate (Func<int, bool>
) in a variable (_filter
). However, it still allocates 40B and I have no idea why...
When decompiled, the Any approach turns into Enumerable.Any(Func<TSource, bool> predicate)
method:
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
}
if (predicate == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.predicate);
}
foreach (TSource element in source)
{
if (predicate(element))
{
return true;
}
}
return false;
}
How is the Any approach different from the Foreach approach? Just curious...
EDIT
As Jon Skeet suggested, I tried changing the _items
collection from a List<int>
to IEnumerable<int>
to make the comparison fair. In short, that seems to be the key difference. My Foreach seems to be taking advantage of that fact that it know the _items
collection is a List<T>
while the Enumerable.Any
method takes an IEnumerable<int>
.
Here are the benchmark results for that:
Items: 1 000, searched item: 543
Method | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|
Foreach | 2.126 us | 1.00 | 40 B | 1.00 |
Any | 2.131 us | 1.00 | 40 B | 1.00 |
Items: 10 000, searched item: 5 432
Method | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|
Foreach | 21.35 us | 1.00 | 40 B | 1.00 |
Any | 21.20 us | 0.99 | 40 B | 1.00 |
Items: 100 000, searched item: 54 321
Method | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|
Foreach | 220.7 us | 1.00 | 40 B | 1.00 |
Any | 219.1 us | 0.99 | 40 B | 1.00 |
When working with IEnumerable<int>
, the two approaches are performing the same. Thanks Jon Skeet!
CodePudding user response:
Compiler can optimize working with the List<T>
inside foreach (compared to IEnumerable<T>
). I would not be able to explain in the details, but if you check the generated IL (for example at sharplab.io) you already will see the differences - compiler can call concrete methods on List<T>.Enumerator
instead polymorphic invocation via callvirt
(Call and Callvirt). Not sure that this (and one time allocation due to working with struct List<T>.Enumerator
via interface) results in such performance difference. Possibly runtime can optimize it even more (check out the JIT Asm difference at sharplab.io if you want to try going deeper).
If you check the source code for Enumerable.Any
you will see that it uses the same foreach
loop and difference boils down to using IEnumerable
interface:
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
}
if (predicate == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.predicate);
}
foreach (TSource element in source)
{
if (predicate(element))
{
return true;
}
}
return false;
}
So, as correctly diagnosed by @Jon Skeet in the comments, the difference comes from using list vs enumerable.