Home > Net >  Nested Dict to DataFrame in Python
Nested Dict to DataFrame in Python

Time:05-12

I am trying to get the following dict into a specific DataFrame.

The dict is structured as follows :

test = {'JTTIX US Equity': 
            {'HB_ASSET_CLASS_ALLOCATION': 
                {'Equity': '32.14642',
                 'Debt': '60.890407'},
             'HB_MARKET_CAP_ALLOCATION': 
                {'Small-cap': '0.776239',
                 'Mid-cap': '10.81831',
                 'Large-cap': '88.027725'}}}

The final DataFrame that i am trying to reach is :

BBG_ID              FIELD                       TYPE                        VALUE
JTTIX US Equity     HB_ASSET_CLASS_ALLOCATION   Equity                      32.14642
JTTIX US Equity     HB_ASSET_CLASS_ALLOCATION   Debt                        60.890407
JTTIX US Equity     HB_MARKET_CAP_ALLOCATION    Small-cap                   0.776239
JTTIX US Equity     HB_MARKET_CAP_ALLOCATION    Mid-cap                     10.81831
JTTIX US Equity     HB_MARKET_CAP_ALLOCATION    Large-cap                   88.027725

I tried all the options on this nice answer here Convert list of dictionaries to a pandas DataFrame, but i am still not able to reach the final format i want.

I think i can do it by creating a loop but want to double check if there is a way i can avoid it since the real dict is really big.

CodePudding user response:

Here's one way using stack. Essentially construct the DataFrame, stack it; construct a DataFrame again, stack it again; then reset_index:

tmp = pd.DataFrame.from_dict(test, orient='index').stack()
out = pd.DataFrame(tmp.tolist(), tmp.index).stack().rename_axis(index=['BBG_ID','FIELD','TYPE']).reset_index(name='VALUE')

Output:

            BBG_ID                      FIELD       TYPE      VALUE
0  JTTIX US Equity  HB_ASSET_CLASS_ALLOCATION     Equity   32.14642
1  JTTIX US Equity  HB_ASSET_CLASS_ALLOCATION       Debt  60.890407
2  JTTIX US Equity   HB_MARKET_CAP_ALLOCATION  Small-cap   0.776239
3  JTTIX US Equity   HB_MARKET_CAP_ALLOCATION    Mid-cap   10.81831
4  JTTIX US Equity   HB_MARKET_CAP_ALLOCATION  Large-cap  88.027725

CodePudding user response:

Another approach is to reformat the nested dictionary into a dictionary using a recursive function and construct a Series object and convert it to a DataFrame:

def reformat_d(d, key=()):
    for k,v in d.items():
        if isinstance(v, dict):
            yield from reformat_d(v, key   (k,))
        else:
            yield (key   (k,), v)
            
out = pd.Series(dict(reformat_d(test))).rename_axis(index=['BBG_ID','FIELD','TYPE']).reset_index(name='VALUE')

Output:

            BBG_ID                      FIELD       TYPE      VALUE
0  JTTIX US Equity  HB_ASSET_CLASS_ALLOCATION     Equity   32.14642
1  JTTIX US Equity  HB_ASSET_CLASS_ALLOCATION       Debt  60.890407
2  JTTIX US Equity   HB_MARKET_CAP_ALLOCATION  Small-cap   0.776239
3  JTTIX US Equity   HB_MARKET_CAP_ALLOCATION    Mid-cap   10.81831
4  JTTIX US Equity   HB_MARKET_CAP_ALLOCATION  Large-cap  88.027725
  • Related