I need to store integers as a string. Eg. - [1,2,3] will be stored as '1;2;3'. For doing this I need to first convert the list of integers to a list of strings. But the memory usage for this conversion is huge.
The sample code to show the problem.
from sys import getsizeof
import tracemalloc
tracemalloc.start()
curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print()
list_int = [1]*int(1e6)
curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print(f'Size of list_int: {getsizeof(list_int)/1e6} MB')
print()
list_str = [str(i) for i in list_int]
curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print(f'Size of list_str: {getsizeof(list_str)/1e6} MB')
Output:
Current: 0 MB
Peak: 0 MB
Current: 8 MB
Peak: 8 MB
Size of list_int: 8.000056 MB
Current: 66 MB
Peak: 66 MB
Size of list_str: 8.448728 MB
The memory taken by both lists is similar (8 MB), but the memory used by the program during conversion is huge (66 MB).
How can I solve this memory issue?
Edit: My need is to convert it to a string, so I will run ';'.join(list_str)
in the end. So,, even if I use a generator/iterable let's say list_str = map(str, list_int)
, the memory usage comes out to be same.
CodePudding user response:
Use Numpy instead. Try this
from sys import getsizeof
import tracemalloc
import numpy as np
tracemalloc.start()
arr = np.ones((1000000,), dtype=np.str)
for i in [1]*int(1e6):
arr[i] = str(i)
curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print(f'Size of list_str: {getsizeof(list(arr))/1e6} MB')
Output with bit improvement I Think
Current: 4 MB
Peak: 12 MB
Size of list_str: 9.000112 MB
CodePudding user response:
After some thoughts, I think that the result Size of list_str: 8.448728 MB
is misleading; the true size of list_str
is actually larger: 58.70MB.
If you read the doc about getsizeof
carefully, you will find the following:
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
That is, getsizeof
does not count the size of the contents, i.e., the strings in the list.
Using the proposed method introduced therein, you can find that the true total size of the list [str(i) for i in [1] * int(1e6)]
is about 58.70MB.
Now add this to the total size (8MB) of the other list in your hands, [1] * int(1e6)
, and you will get the number 66MB
that you observe.
Therefore my answer is that as long as you want to have the list of strings, there is no better way to do, since actually no excessive memory is exploited along the way.