How to convert pandas DataFrame to nested dictionary?-CodePudding

I have a pandas DataFrame like this:

id	unit	step	phase	start_or_end_of_phase	op_name	occurence
1	A	50l	LOAD	start	P12load5	2
2	A	50l	LOAD	end	P12load5	2
3	A	50l	STIR	start	P12s5	4
4	A	50l	STIR	end	P13s5	3
5	A	50l	COLLECT	start	F7_col1	1
6	A	50l	COLLECT	end	H325_col1	1
7	A	1000l	SET_TEMP	start	xyz	2
8	A	1000l	SET_TEMP	end	qwe	3
9	A	1000l	SET_TEMP2	start	asf	4
10	A	1000l	SET_TEMP2	end	fdsa	5
11	A	1000l	FILTER	start	4fags	1
11	A	1000l	FILTER	end	mllsgrs_1	1
12	B	MACHINE1	...	...	...	...

...and want to create nested dictionaries like this:

A = {50l : {
       'LOAD' :
                {'start':{'op_name' : 'p12load5',
                           'occurrence': 2},
                 'end':{'op_name': 'P12load5',
                        'occurrence': 2}},
        'STIR': 
                {'start':{'op_name' : 'P12s5',
                           'occurrence': 4},
                 'end':{'op_name': 'P13s5',
                        'occurrence': 3}},
        'COLLECT': 
                {'start':{'op_name' : 'F7_col1',
                           'occurrence': 1},
                 'end':{'op_name': 'H325_col1',
                        'occurrence': 1}}
          }, 
    1000l : {
       'SET_TEMP' : ....

I have been trying to combine groupby() with to_dict() but couldn't wrap my head around it. My last attempt was this (based on How to convert pandas dataframe to nested dictionary):

populated_dict = process_steps_table.groupby(['unit', 'step', 'phase', 'start_or_end_phase']).apply(lambda x: x.set_index('start_or_end_phase').to_dict(orient='index')).to_dict()

and got his error: DataFrame index must be unique for orient='index'.

I am not sure if I have to apply the set_index() lambda function to the groups and why.

CodePudding user response：

You have to reshape your dataframe before export as dictionary:

nested_cols = ['step', 'phase', 'start_or_end_of_phase']
value_cols = ['op_name', 'occurence']

# Reshape your dataframe
df1 = df.set_index(nested_cols)[value_cols].stack()

# Export nested dict
d = {}

# items():
# t -> flatten index to convert to nested dict
# v -> last level of your nested dict (values)
for t, v in df1.items():
    e = d.setdefault(t[0], {})  # create a new entry with an empty dict
    for k in t[1:-1]:
        e = e.setdefault(k, {})  # create a nested sub entry with an empty dict
    e[t[-1]] = v  # finally add values when you reach the end of the index

Output

import json  # just for a best representation
print(json.dumps(d, indent=4))

# Output
{
    "50l": {
        "LOAD": {
            "start": {
                "op_name": "P12load5",
                "occurence": 2
            },
            "end": {
                "op_name": "P12load5",
                "occurence": 2
            }
        },
        "STIR": {
            "start": {
                "op_name": "P12s5",
                "occurence": 4
            },
            "end": {
                "op_name": "P13s5",
                "occurence": 3
            }
        },
        "COLLECT": {
            "start": {
                "op_name": "F7_col1",
                "occurence": 1
            },
            "end": {
                "op_name": "H325_col1",
                "occurence": 1
            }
        }
    },
    "1000l": {
        "SET_TEMP": {
            "start": {
                "op_name": "xyz",
                "occurence": 2
            },
            "end": {
                "op_name": "qwe",
                "occurence": 3
            }
        },
        "SET_TEMP2": {
            "start": {
                "op_name": "asf",
                "occurence": 4
            },
            "end": {
                "op_name": "fdsa",
                "occurence": 5
            }
        },
        "FILTER": {
            "start": {
                "op_name": "4fags",
                "occurence": 1
            },
            "end": {
                "op_name": "mllsgrs_1",
                "occurence": 1
            }
        }
    }
}