Home > Blockchain >  Saving data into an hdf5 file: necessary to create a new dataspace for every dataset?
Saving data into an hdf5 file: necessary to create a new dataspace for every dataset?

Time:03-30

I have a 2D array which I need to write into an hdf5 file. I need the 2D array split up into a set of 1D arrays, i.e. every 1D array has the same size. For some reason I create a new dataspace in my C code for every one of those 1D arrays. To be more precise, my current approach is

  1. Open (or create) file with H5Fcreate
  2. Create group with H5Gcreate2
  3. Create dataspace with H5Screate_simple
  4. Create dataset with H5Dcreate
  5. Write 1D array into dataset with H5Dwrite
  6. Close dataset identifier with H5Dclose
  7. Close dataspace identifier with H5Sclose
  8. Go back to 3. and continue with the next 1D array

After having a look at the documentation for H5Dcreate2, I realize that I might not need to close the dataspace after each 1D array and create a new one afterwards for the next 1D array, as those 1D arrays are all of the same size.

My question: is there any reason, why I should create a new/separate dataspace for every single 1D array, if all those 1D arrays have the same size (but different content)?

CodePudding user response:

Your conclusion is correct: there is no need to create a new/separate dataspace for every single 1D array you create (in other words, just re-use the same dataspace as many times needed as long as all these arrays are of the same size and data type).

This is what happens when using HDFql. Coding your algorithm above using HDFql, it looks as follows in C:

// declare variable
char script[100];

// create an HDF5 file named 'test.h5' and use it (i.e. open it)
hdfql_execute("create and use file test.h5");

// create an HDF5 group named 'my_group'
hdfql_execute("create group my_group");

// create 100 arrays
for(int i = 0; i < 100; i  )
{
    // prepare script to create a one dimensional (size 10) dataset of data type int
    sprintf(script, "create dataset my_group/my_dataset_%d as int(10)", i);

    // execute script
    hdfql_execute(script);
}

// close file
hdfql_execute("close file");

In this code above this is done transparently (from the user point-of-view) but this is what is happening under the hood.

  • Related