Home > Net >  Why does setting key='table' in pd.DataFrame.to_hdf() create an extra empty key in the res
Why does setting key='table' in pd.DataFrame.to_hdf() create an extra empty key in the res

Time:12-24

When writing a pandas DataFrame to hdf, if key is set to 'table' then the resulting hdf contains an empty key '/'. Other string values I have tried do not do this, and it seems strange that behaviour would depend on the name of a key. Why does this happen?

>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df.to_hdf('hdf1', key='a_key_that_is_not_table')
>>> df.to_hdf('hdf2', key='table')
>>> store1 = pd.HDFStore('hdf1')
>>> store2 = pd.HDFStore('hdf2')
>>> store1.keys()
['/a_key_that_is_not_table']
>>> store2.keys()
['/', '/table']

Updated example script:

#!/usr/bin/python3

import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
keys = ['a_key_that_is_not_table', 'table']
for idx, key in enumerate(keys):
    filename = f'df{idx}.h5'
    df.to_hdf(filename, key=key, mode='w', format='table')
    store = pd.HDFStore(filename)
    print(f'Loop {idx}, key = {key}, store.keys() ={store.keys()}')
    store.close()

Output:

Loop 0, key = a_key_that_is_not_table, store.keys() =['/a_key_that_is_not_table']
Loop 1, key = table, store.keys() =['/', '/table']

CodePudding user response:

Every HDF5 file has a "root group" referenced as "/". If you impsect both files with HDFView, you will find each has 1 group (named '/a_key_that_is_not_table' in file df0.h5 and '/table' in file df1.h5), So, it's not an error from HDF5 schema standpoint.

Looking deeper into the files, I suspect the issue is from Pandas abstraction layer on top of PyTables. Both files have the same schema. Under each named key (HDF5 group) there is a group named '_i_table' which has a subgroup named 'index' and a dataset named 'table'. Likely 'table' is a reserved name, and using it as a key trips up Pandas key name logic. Changing 'table' to 'Table' eliminates the '/' in the output for df1.h5.

CodePudding user response:

You need to specify the format of your file. Plus, I think that you need to add a "w" (write) mode to the function because the default mode is set to "a" (append). For example :

df.to_hdf('data.h5', key='df', mode='w')

If you need to see more : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_hdf.html

  • Related