Back in the days, I was always told to use AddRange
whenever possible because of the performance gain over Add
. Today, I wanted to validate this statement prior to using it with the output of a List<T>.Select(x => ...)
call but from the decompiled code of List<T>
, I am under the impression the foreach {add}
loop should be faster.
The main reasons I see are the numerous additional checks that are done in the process since it is not an ICollection
- null-check on the new items collection
- boundary check on the index (since AddRange is in fact a call to InsertRange with index
_size
- try type cast to
ICollection
- type check to know if it is an
ICollection
(which it isn't after theselect
call) - another boundary check when calling
Insert
- capacity check (this one also exists in
Add
) - position check (since
Insert
can also put data prior or within the actual list)
Has anyone ever done reliable benchmarking on this?
Edit: Added Code samples of the two options
var clientsConfirmations = new List<ClientConfirmation>();
foreach (var client in Clients)
{
bool flag = false;
if (client.HasFlag.HasValue)
{
flag = client.HasFlag.Value;
}
var clientConf = new ClientConfirmation(client.Id, client.Name, flag);
clientsConfirmations.Add(clientConf);
}
Versus
var clientsConfirmations = new List<ClientConfirmation>();
clientsConfirmations.AddRange(
Clients.Select(client =>
new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
)
);
CodePudding user response:
If you call somelist.AddRange(List<T>.Select(...))
it is not an ICollection<T>
then it won't benefit from the optimized path where it can use a single resize and array.copy.
However if it would be somelist.AddRange(List<T>.Select(...).ToList())
it might use it but you have to pay for the extra ToList()
and that in total would still be worse because it creates an extra list which would be equivalent to calling AddRange
on an empty list.
CodePudding user response:
AddRange
is faster with ICollection<T>
because it can check the size of any ICollection<T>
argument and allocate a buffer large enough to hold all items from the start. This isn't the case in either example.
Right now, both options are equally slow, because both enumerate over an IEnumerable<T>
of unknown size and add items to the buffer one by one. Every time the internal buffer is full, a new one is allocated with double the size, the old data is copied over, and the old buffer is orphaned and eventually garbage-collected.
The intial buffer holds 1 item. Adding a lot of items one by one ends up allocating log2(N) temporary buffers and taking up 2 times the memory the final buffer does.
This is the case even with the second snippet, which hides what's actually going on.
var clientsConfirmations = new List<ClientConfirmation>();
clientsConfirmations.AddRange(
Clients.Select(client =>
new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
)
);
Is actually
var clientsConfirmations = new List<ClientConfirmation>();
IEnumerable<ClientConfirmation> results=Clients.Select(client =>
new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false);
clientsConfirmations.AddRange(results);
A more readable (but equally slow) equivalent is :
var confirmations=Clients.Select(client => new ClientConfirmation(
client.Id,
client.Name,
client.HasFlag ?? false)
.ToList();
ToList()
will do what both snippets do - create a new List and add items one by one.
To get better performance the List<T>(int capacity)
constructor should be used to create a list with a large enough initial buffer. This doesn't have to be accurate. Even a rough guess will reduce allocations and copies. Using 32 as the capacity will save 5 allocations:
var confirmations = new List<ClientConfirmation>(32);
var results=Clients.Select(client =>
new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false);
confirmations.AddRange(results);