Home > OS >  Optimal memory storage, nested lists vs. flat lists
Optimal memory storage, nested lists vs. flat lists

Time:06-11

I have a fairly large amount of data that needs to be stored in memory in Python, and i'm trying to work out how to save memory space as i'm continually running out of RAM.

I have restricted myself to use only basic Python methods like lists, dicts and tuples as i have found, that these often have a huge advantage in speed when i need to read/write the data.

How much am i penalized memory-wise for organizing my data in nested lists/dicts/tuples vs. just one flat list/dict/tuple?

Nested example:

[
    [
        [ a ],
        [ b ],
        [ c ]
    ],
    [
        [ d ],
        [ e ],
        [ f ]
    ],
]

Flat list:

[ a, b, c, d, e, f ]

1st edit: Data is a mix of string, float and int values.

2nd edit: Context as requested: These are small datasets for use in a neural network. The data cannot readily be split up or handled in chunks, as it will impair the training process, or require a large amount of the code to be rewritten. I have 32gb of RAM available.

CodePudding user response:

If your data is all of the same type, especially if it is primitive types (int, float, character, not str though) try using numpy arrays. Numpy stores data as a flat list but let's you access it like it's nested, and will generally use less memory as it is implemented to be more memory and speed efficient than lists. Keep in mind though that this only applies to rectangular arrays (ie. each sublist must have the same length).

CodePudding user response:

In addition to what Kraigolas said here, if your data is numeric and you can store it as a flat list, you could use arrays from the array module which are optimized for efficient data storage. It also comes as a part of the standard library, so you'll not end up with any additional dependencies, assuming you're not using numpy already.

  • Related