Home > OS >  Data Frame to nested dictionary: Python
Data Frame to nested dictionary: Python

Time:12-05

I'm working with a dataframe:

   day        account_id     balance
  2022-11-01  ncw8y7     1.000000
  2022-11-02  ncw8y7     1.000424

Which I would like to transform into nested dictionary in which the 1st level is the account_id, 2nd level is category name "balances", 3rd level is the date which finally contains date as a list and balance:

{"account_id":"ncw8y7","balances":{"2022-11-01":{"day":[2022,11,01],"balance":1.00000},"2022-11-02":{"day":[2022,11,2],"balance":1.000424}}}

I tried to adopt solutions found in this and that post. However, fo some reason when using the first solution to try to learn how to do first part of this transformation (part that does not include yet a date as a list):

dict = results.groupby('account_id')[['day','balance']].apply(lambda x: x.set_index('day').to_dict(orient='index')).to_dict()

I am getting

{'ncw8y7': {datetime.date(2022, 11, 1): {'balance': 1.00000},
  datetime.date(2022, 11, 2): {'balance': 1.000424}}}

I don't get why 'day' variable is getting formated in this way: datetime.date(2022, 11, 1). Is that because of getting it as index? How would you suggest me to adopt the dict function to get to the desired effect?

CodePudding user response:

This should work:

dfx=df.groupby('account_id').agg(list)
final=[]
for i in dfx.index:
    final.append({"account_id":i,"balances":{dfx['day'][i][j]:{"day":[dfx['day'][i][j]],"balance":dfx['balance'][i][j]} for j in range(len(dfx['day'][i]))}})

Output:

[
{
    "account_id":"ncw8y7",
    "balances":{
        "2022-11-01":{
            "day":[
                "2022-11-01"
            ],
            "balance":1.0
        },
        "2022-11-02":{
            "day":[
                "2022-11-02"
            ],
            "balance":1.000424
        }
    }
}

]

CodePudding user response:

you can try using the to_json() method as its output is in dict format (source)

data.to_json('path',orient='records')

it will provide the format u need but if you want to work on the data after the conversion better option is to_dict() I guess

df.to_dict(orient=records)

orient: {‘dict’, ‘list’, ‘series’, ‘split’, ‘tight’, ‘records’, ‘index’}

Determines the type of the values of the dictionary.

‘dict’ (default) : dict like {column -> {index -> value}}

‘list’ : dict like {column -> [values]}

‘series’ : dict like {column -> Series(values)}

‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

‘tight’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values], ‘index_names’ -> [index.names], ‘column_names’ -> [column.names]}

‘records’ : list like [{column -> value}, … , {column -> value}]

‘index’ : dict like {index -> {column -> value}}
  • Related