I'm trying to add a lot of data to a List, but it seems to use a lot more RAM than an array does. I was wondering why that is and if there's a better solution.
This solution with an array takes about 78 MB of RAM. Makes sense since 4 byte * 20000000 ~= 76 MB:
float[] arrayValues = new float[20000000];
for (int i = 0; i < 20000000; i )
arrayValues[i] = i;
But this solution with a list takes 206 MB (!!):
List<float> listValues = new();
for (int i = 0; i < 20000000; i )
listValues.Add(i);
How can that be? It's basically doing the same thing - saving 20000000 float values. Where is that additional 128 MB coming from? And is there a better way that doesn't produce so much overhead?
CodePudding user response:
When you Add
new items into a List<T>
it has to do memory reallocation to have enough space for these new items.
Let's have a look at the process:
List<float> listValues = new();
int capacity = listValues.Capacity;
for (int i = 0; i < 20000000; i ) {
listValues.Add(i);
if (capacity != listValues.Capacity) {
capacity = listValues.Capacity;
Console.WriteLine(capacity);
}
}
Outcome:
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432 // <- Finally, list allocates memory for 33554432 items
As you can see, now 33554432
items are allocated and 4 8 16 ... 16777216
are garbage. In the worst case we have
33554432
allocated items and 33554432
garbage items; in total 33554432 33554432 = 67108864 ~ 3 * 20000000
and you can see this 3 factor.
What can you do?
Specify the Capacity
in order to avoid realloaction (typical solution):
// We can avoid all this mess with reallocations
// by specifing required capacity: 20000000 items in our case
List<float> listValues = new(20000000);
for (int i = 0; i < 20000000; i ) {
listValues.Add(i);
}
Collect all garbage before measuring:
// Business as usual
List<float> listValues = new();
for (int i = 0; i < 20000000; i ) {
listValues.Add(i);
}
// Collect garbage to measure real List efficency:
// List allocates 33554432 items vs. 20000000 in case of array
// About ~70% overhead
GC.Collect(2);