I have an Octave code that gathers data from thousands of .csv files and stores it in a 4-dimensional matrix (800x8x80x213) so I can access it with other code. The process of reading in the data takes about 10 minutes so I thought it would be a good idea to save the matrix and then I could load it into the workspace whenever I wanted to work with the data instead of waiting 10 minutes for the matrix to be created. I used Save
to save the matrix and Load
to load it into the workspace, however when I loaded the matrix, it took 30 minutes to complete. Is there a better/faster way to save/load this 4-D matrix? Seems ridiculous that it takes 3 times longer to load a matrix than to create it from 4000 files...
CodePudding user response:
The default 'format' option used by the save
command is -text
, which is human readable. For large datasets, this will take a long time to create (not to mention, it will lead to a much larger file, since it will need to represent floating point numbers via their text representations...), so it is indeed inappropriate for this kind of data. Loading from a large text format file will also take quite a long time, especially on a slow computer, for the same reasons.
Octave also supports a -binary
option, which is octave's internal binary format. This is what you need. E.g.
save -binary outputfile.bin varname
In this particular case, the text file is 2.2G, whereas the binary format is the expected 872Mb (i.e. number of elements * 8 bytes per element). Saving and loading is near instant.
Alternatively, there's a bunch of other options too, corresponding to other common formats, e.g. as a commenter has also mentioned here, -hdf5
, or -v7
which is matlab's .mat format.
Type help save
on your octave console for more details.