I have a very large List<int>
full of integers. I want to remove every occurrence of "3" from the List<int>
. Which technique would be most efficient to do this? I would normally use the .Remove(3)
extension until it returns false
, but I fear that each call to .Remove(3)
internally loops through the entire List<int>
unnecessarily.
EDIT: It was recommended in the comments to try
TheList = TheList.Where(x => x != 3).ToList();
but I need to remove the elements without instantiating a new List.
var TheList = new List<int> { 5, 7, 8, 2, 8, 3, 1, 0, 6, 3, 9, 3, 5, 2, 7, 9, 3, 5, 5, 1, 0, 4, 5, 3, 5, 8, 2, 3 };
//technique 1
//this technique has the shortest amount of code,
//but I fear that every time the Remove() method is called,
//the entire list is internally looped over again starting at index 0
while (TheList.Remove(3)) { }
//technique 2
//this technique is an attempt to keep the keep the list from
//being looped over every time an element is removed
for (var i = 0; i < TheList.Count; i )
{
if (TheList[i] == 3)
{
TheList.RemoveAt(i);
i--;
}
}
Are there any better ways to do this?
Benchmarks
I tested three techniques to remove 10,138 from an array with 100,000 elements: the two shown above, and one recommended by Serg in an answer. These are the results:
- 'while' loop: 179.6808ms
- 'for' loop: 65.5099ms
- 'RemoveAll' predicate: 0.5982ms
Benchmark Code:
var RNG = new Random();
//inclusive min and max random number
Func<int, int, int> RandomInt = delegate (int min, int max) { return RNG.Next(min - 1, max) 1; };
var TheList = new List<int>();
var ThreeCount = 0;
for (var i = 0; i < 100000; i )
{
var TheInteger = RandomInt(0, 9);
if (TheInteger == 3) { ThreeCount ; }
TheList.Add(TheInteger);
}
var Technique1List = TheList.ToList();
var Technique2List = TheList.ToList();
var Technique3List = TheList.ToList();
<div style="background-color:aquamarine;color:#000000;">Time to remove @ThreeCount items</div>
//technique 1
var Technique1Stopwatch = Stopwatch.StartNew();
while (Technique1List.Remove(3)) { }
var Technique1Time = Technique1Stopwatch.Elapsed.TotalMilliseconds;
<div style="background-color:#ffffff;color:#000000;">Technique 1: @(Technique1Time)ms ('while' loop)</div>
//technique 2
var Technique2Stopwatch = Stopwatch.StartNew();
for (var i = 0; i < Technique2List.Count; i )
{
if (Technique2List[i] == 3)
{
Technique2List.RemoveAt(i);
i--;
}
}
var Technique2Time = Technique2Stopwatch.Elapsed.TotalMilliseconds;
<div style="background-color:#ffffff;color:#000000;">Technique 2: @(Technique2Time)ms ('for' loop)</div>
//technique 3
var Technique3Stopwatch = Stopwatch.StartNew();
var RemovedCount = Technique3List.RemoveAll(x => x == 3);
var Technique3Time = Technique3Stopwatch.Elapsed.TotalMilliseconds;
<div style="background-color:#ffffff;color:#000000;">Technique 3: @(Technique3Time)ms ('RemoveAll' predicate)</div>
CodePudding user response:
You can just use List<T>.RemoveAll
and pass your predicate - https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.removeall?view=net-6.0#System_Collections_Generic_List_1_RemoveAll_System_Predicate__0__ . This guaranteed to be linear complexity O(list.Count)
TheList.RemoveAll(x=>x==3);
Additionally, RemoveAll
performs some GC-specific things internally, so I think in some cases this may provide some additional performance advantages against the simple hand-made loop implementation (but I'm unsure here).
If you want to do it all yourself, you can check out the implementation of RemoveAll
here. Generally, it is just a while
loop as in your question.
Additionally, as we can see from GitHub implementation (and as Jon Skeet mentioned in the comment) the remove operation causes the rest of list (all items after the first removed items) to be copied (shifted) on the free space, intorduced by deletion. So, if you have really huge list and/or want to remove something frequently, you may consider to switching to some other data structure, such as linked list.