I have been trying out with the Klepto package to Write/Read/Update my object to harddisk, aiming to avoid the "out of memory" issues that I experienced when training my model with my dataset. From my understanding, with the Klepto I could store my data as a key-value based mechanism. But I am not quite sure if I could directly Update the object when I load the data back from the klepto.archieve. When updating, e.g. adding a value to the list, while keeping not to directly load the object out to memory to avoid "out of memory" problem.
Here is a sample about the saved data (please correct me if this is also not the correct way for setting it up):
from klepto.archives import *
arch = file_archive('test.txt')
arch['a'] = [3,4,5,6,7]
arch.dump()
arch.pop('a')
CodePudding user response:
I'm the klepto
author. If I understand what you want, it looks like you have set it up correctly. The critical keyword is cached
. If you use cached=True
, then the archive is constructed as an in-memory cache with a manually-synchronized file backend. If you use cached=False
, then there's no in-memory cache... you just access the file archive directly.
Python 3.7.16 (default, Dec 7 2022, 05:04:27)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import *
>>> arch = file_archive('test.txt', cached=True)
>>> arch['a'] = [3,4,5,6,7]
>>> arch.dump() # dump to file archive
>>> arch.pop('a') # delete from memory
[3, 4, 5, 6, 7]
>>> arch
file_archive('test.txt', {}, cached=True)
>>> arch.load('a') # load from file archive
>>> arch
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=True)
>>>
>>> arch2 = file_archive('test.txt', cached=True)
>>> arch2
file_archive('test.txt', {}, cached=True)
>>> arch2.load() # load from file archive
>>> arch2
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=True)
>>>
>>> arch3 = file_archive('test.txt', cached=False)
>>> arch3 # directly access file-archive
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=False)
>>>
You can also manipulate objects that are already in the archive... unfortunately, for cached=False
, the object needs to be loaded into memory to be edited (due to lack of implementation for in-archive editing, you can only replace objects in a cached=False
archive).
>>> arch2
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=True)
>>> arch2['a'].append(8) # edit the in-memory object
>>> arch2
file_archive('test.txt', {'a': [3, 4, 5, 6, 7, 8]}, cached=True)
>>> arch2.dump('a') # save changes to file-archive
>>> arch3
file_archive('test.txt', {'a': [3, 4, 5, 6, 7, 8]}, cached=False)
>>>
>>> arch3['a'] = arch2['a'][1:] # replace directly in-file
>>> arch3
file_archive('test.txt', {'a': [4, 5, 6, 7, 8]}, cached=False)