Home > database >  h5py doesn't support NumPy dtype('U') (Unicode) and pandas doesn't support NumPy
h5py doesn't support NumPy dtype('U') (Unicode) and pandas doesn't support NumPy

Time:08-13

I'm trying to create a .h5 file with a dataset that contains the data from a .dat file. First, I approach this using numpy:

import numpy as np
import h5py

filename = 'VAL220408-invparms.dat'
datasetname = 'EM27_104_COCCON_VAL/220408'

dtvec = [float for i in range(149)] #My data file have 149 columns
dtvec[1] = str
dtvec[2] = str #I specify the dtype of the second and third column

dataset = np.genfromtxt(filename,skip_header=0,names=True,dtype=dtvec)

fh5 = h5py.File('my_data.h5', 'w')
fh5.create_dataset(datasetname,data=dataset)
fh5.flush()
fh5.close()

But when running I get the error:

TypeError: No conversion path for dtype: dtype('<U')

If I don't specify the dtype everything is fine, the dataset is in order and the numerical values are correct, just the second and third columns have values of NaN; and I don't want that.

I found that h5py does not support Numpy's encoding for strings, so I supposed that using a dataframe from pandas will work. My code using pandas is like this:

import numpy as np
import pandas as pd

filename = 'VAL220408-invparms.dat'
datasetname = 'EM27_104_COCCON_VAL/220408'

df = pd.read_csv(filename,header=0,sep="\s ")

fh5 = h5py.File('my_data.h5', 'w')
fh5.create_dataset(datasetname,data=df)
fh5.flush()
fh5.close()

But then I get the error:

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Then I found that pandas had a function that transforms a dataframe into a .h5 file, so insted using h5py library I made:

df.to_hdf('my_data.h5','datasetname',format='table',mode='a')

BUT the data is all messed up in many tables inside the .h5 file.

  • Related