How to convert pandas dataframe into nested dictionary or json?-CodePudding

I am running Python 3.8 and Pandas 0.19.2 and have a DataFrame which looks as follows:

id_number	name	amount	addenda
1234	ABCD	$100	Car-wash-$30
1234	ABCD	$100	Maintenance-$70

I need a dictionary/JSON like below:

[
    {
       'id_number': 1234,
       'name': 'ABCD',
       'amount': '$100',
       'addenda': [ 
                  {'payment_related_info': 'Car-wash-$30'}, 
                  {'payment_related_info': 'Maintenance-$70'}
                  ]
    }
]

I tried using groupby and to_dict, but it did not work. Any suggestions? Thanks up front for the help.

CodePudding user response：

Just apply a groupby and aggregate by creating a dataframe inside like this:

data = {
    "id_number": [1234, 1234],
    "name": ["ABCD", "ABCD"],
    "amount": ["$100", "$100"],
    "addenda": ["Car-wash-$30", "Maintenance-$70"]
}
df = pd.DataFrame(data=data)

df.groupby(by=["id_number", "name", "amount"]) \
    .agg(lambda col: pd.DataFrame(data=col) \
         .rename(columns={"addenda": "payment_related_info"})) \
    .reset_index() \
    .to_json(orient="records")

This returns axactly the result you want!

CodePudding user response：

If we work backwards, you'll need your DataFrame to have the addenda information in a single row before using .to_dict operation:

id_number	name	amount	addenda
1234	ABCD	$100	[{payment_related_info: Car-wash-$30, payment_related_info: Maintenance-$70}]

To get here, you can perform a groupby on id_number, name, amount, then apply a function that collapses the strings in the addenda rows for that groupby into a list of dictionaries where each key is the string 'payment_related_info'.

This works as expected if you add more rows to your original df as well:

id_number	name	amount	addenda
1234	ABCD	$100	Car-wash-$30
1234	ABCD	$100	Maintenance-$70
2345	BCDE	$200	Car-wash-$100
2345	BCDE	$200	Maintenance-$100

def collapse_row(x):
    addenda_list = x["addenda"].to_list()
    last_row = x.iloc[-1]
    last_row["addenda"] = [{'payment_related_info':v} for v in addenda_list] 
    return last_row

grouped = df.groupby(["id_number","name","amount"]).apply(collapse_row).reset_index(drop=True)
grouped.to_dict(orient='records')

Result:

[
    {
       "id_number":1234,
       "name":"ABCD",
       "amount":"$100",
       "addenda":[
                 {"payment_related_info":"Car-wash-$30"},                              
                 {"payment_related_info":"Maintenance-$70"}
                 ]
    },
    {
      "id_number":2345,
       "name":"BCDE",
       "amount":"$200",
       "addenda":[
                 {"payment_related_info":"Car-wash-$100"}, 
                 {"payment_related_info":"Maintenance-$100"}
                 ]
    }
]