Read/Write/Updating object without loading the object to memory-CodePudding

I have been trying out with the Klepto package to Write/Read/Update my object to harddisk, aiming to avoid the "out of memory" issues that I experienced when training my model with my dataset. From my understanding, with the Klepto I could store my data as a key-value based mechanism. But I am not quite sure if I could directly Update the object when I load the data back from the klepto.archieve. When updating, e.g. adding a value to the list, while keeping not to directly load the object out to memory to avoid "out of memory" problem.

Here is a sample about the saved data (please correct me if this is also not the correct way for setting it up):

from klepto.archives import *
arch = file_archive('test.txt')
arch['a'] = [3,4,5,6,7]
arch.dump()
arch.pop('a')

CodePudding user response：

I'm the klepto author. If I understand what you want, it looks like you have set it up correctly. The critical keyword is cached. If you use cached=True, then the archive is constructed as an in-memory cache with a manually-synchronized file backend. If you use cached=False, then there's no in-memory cache... you just access the file archive directly.

Python 3.7.16 (default, Dec  7 2022, 05:04:27) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import *
>>> arch = file_archive('test.txt', cached=True)
>>> arch['a'] = [3,4,5,6,7]
>>> arch.dump() # dump to file archive
>>> arch.pop('a') # delete from memory
[3, 4, 5, 6, 7]
>>> arch
file_archive('test.txt', {}, cached=True)
>>> arch.load('a') # load from file archive
>>> arch
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=True)
>>> 
>>> arch2 = file_archive('test.txt', cached=True)
>>> arch2
file_archive('test.txt', {}, cached=True)
>>> arch2.load() # load from file archive
>>> arch2
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=True)
>>> 
>>> arch3 = file_archive('test.txt', cached=False)
>>> arch3 # directly access file-archive
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=False)
>>>

You can also manipulate objects that are already in the archive... unfortunately, for cached=False, the object needs to be loaded into memory to be edited (due to lack of implementation for in-archive editing, you can only replace objects in a cached=False archive).

>>> arch2
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=True)
>>> arch2['a'].append(8) # edit the in-memory object
>>> arch2
file_archive('test.txt', {'a': [3, 4, 5, 6, 7, 8]}, cached=True)
>>> arch2.dump('a') # save changes to file-archive
>>> arch3
file_archive('test.txt', {'a': [3, 4, 5, 6, 7, 8]}, cached=False)
>>> 
>>> arch3['a'] = arch2['a'][1:] # replace directly in-file
>>> arch3
file_archive('test.txt', {'a': [4, 5, 6, 7, 8]}, cached=False)