Converting a list of int to a list of string takes too much memory in Python-CodePudding

I need to store integers as a string. Eg. - [1,2,3] will be stored as '1;2;3'. For doing this I need to first convert the list of integers to a list of strings. But the memory usage for this conversion is huge.

The sample code to show the problem.

from sys import getsizeof
import tracemalloc

tracemalloc.start()

curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))

print()

list_int = [1]*int(1e6)

curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print(f'Size of list_int: {getsizeof(list_int)/1e6} MB')

print()

list_str = [str(i) for i in list_int]

curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print(f'Size of list_str: {getsizeof(list_str)/1e6} MB')

Output:

Current: 0 MB
Peak: 0 MB

Current: 8 MB
Peak: 8 MB
Size of list_int: 8.000056 MB

Current: 66 MB
Peak: 66 MB
Size of list_str: 8.448728 MB

The memory taken by both lists is similar (8 MB), but the memory used by the program during conversion is huge (66 MB).

How can I solve this memory issue?

Edit: My need is to convert it to a string, so I will run ';'.join(list_str) in the end. So,, even if I use a generator/iterable let's say list_str = map(str, list_int), the memory usage comes out to be same.

CodePudding user response：

Use Numpy instead. Try this

from sys import getsizeof
import tracemalloc
import numpy as np

tracemalloc.start()

arr = np.ones((1000000,), dtype=np.str)
for i in [1]*int(1e6):
    arr[i] = str(i)

curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print(f'Size of list_str: {getsizeof(list(arr))/1e6} MB')

Output with bit improvement I Think

Current: 4 MB
Peak: 12 MB
Size of list_str: 9.000112 MB

CodePudding user response：

After some thoughts, I think that the result Size of list_str: 8.448728 MB is misleading; the true size of list_str is actually larger: 58.70MB.

If you read the doc about getsizeof carefully, you will find the following:

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

That is, getsizeof does not count the size of the contents, i.e., the strings in the list.

Using the proposed method introduced therein, you can find that the true total size of the list [str(i) for i in [1] * int(1e6)] is about 58.70MB.

Now add this to the total size (8MB) of the other list in your hands, [1] * int(1e6), and you will get the number 66MB that you observe.

Therefore my answer is that as long as you want to have the list of strings, there is no better way to do, since actually no excessive memory is exploited along the way.