I have a pandas DataFrame like this:
id | unit | step | phase | start_or_end_of_phase | op_name | occurence |
---|---|---|---|---|---|---|
1 | A | 50l | LOAD | start | P12load5 | 2 |
2 | A | 50l | LOAD | end | P12load5 | 2 |
3 | A | 50l | STIR | start | P12s5 | 4 |
4 | A | 50l | STIR | end | P13s5 | 3 |
5 | A | 50l | COLLECT | start | F7_col1 | 1 |
6 | A | 50l | COLLECT | end | H325_col1 | 1 |
7 | A | 1000l | SET_TEMP | start | xyz | 2 |
8 | A | 1000l | SET_TEMP | end | qwe | 3 |
9 | A | 1000l | SET_TEMP2 | start | asf | 4 |
10 | A | 1000l | SET_TEMP2 | end | fdsa | 5 |
11 | A | 1000l | FILTER | start | 4fags | 1 |
11 | A | 1000l | FILTER | end | mllsgrs_1 | 1 |
12 | B | MACHINE1 | ... | ... | ... | ... |
...and want to create nested dictionaries like this:
A = {50l : {
'LOAD' :
{'start':{'op_name' : 'p12load5',
'occurrence': 2},
'end':{'op_name': 'P12load5',
'occurrence': 2}},
'STIR':
{'start':{'op_name' : 'P12s5',
'occurrence': 4},
'end':{'op_name': 'P13s5',
'occurrence': 3}},
'COLLECT':
{'start':{'op_name' : 'F7_col1',
'occurrence': 1},
'end':{'op_name': 'H325_col1',
'occurrence': 1}}
},
1000l : {
'SET_TEMP' : ....
I have been trying to combine groupby() with to_dict() but couldn't wrap my head around it. My last attempt was this (based on How to convert pandas dataframe to nested dictionary):
populated_dict = process_steps_table.groupby(['unit', 'step', 'phase', 'start_or_end_phase']).apply(lambda x: x.set_index('start_or_end_phase').to_dict(orient='index')).to_dict()
and got his error: DataFrame index must be unique for orient='index'.
I am not sure if I have to apply the set_index() lambda function to the groups and why.
CodePudding user response:
You have to reshape your dataframe before export as dictionary:
nested_cols = ['step', 'phase', 'start_or_end_of_phase']
value_cols = ['op_name', 'occurence']
# Reshape your dataframe
df1 = df.set_index(nested_cols)[value_cols].stack()
# Export nested dict
d = {}
# items():
# t -> flatten index to convert to nested dict
# v -> last level of your nested dict (values)
for t, v in df1.items():
e = d.setdefault(t[0], {}) # create a new entry with an empty dict
for k in t[1:-1]:
e = e.setdefault(k, {}) # create a nested sub entry with an empty dict
e[t[-1]] = v # finally add values when you reach the end of the index
Output
import json # just for a best representation
print(json.dumps(d, indent=4))
# Output
{
"50l": {
"LOAD": {
"start": {
"op_name": "P12load5",
"occurence": 2
},
"end": {
"op_name": "P12load5",
"occurence": 2
}
},
"STIR": {
"start": {
"op_name": "P12s5",
"occurence": 4
},
"end": {
"op_name": "P13s5",
"occurence": 3
}
},
"COLLECT": {
"start": {
"op_name": "F7_col1",
"occurence": 1
},
"end": {
"op_name": "H325_col1",
"occurence": 1
}
}
},
"1000l": {
"SET_TEMP": {
"start": {
"op_name": "xyz",
"occurence": 2
},
"end": {
"op_name": "qwe",
"occurence": 3
}
},
"SET_TEMP2": {
"start": {
"op_name": "asf",
"occurence": 4
},
"end": {
"op_name": "fdsa",
"occurence": 5
}
},
"FILTER": {
"start": {
"op_name": "4fags",
"occurence": 1
},
"end": {
"op_name": "mllsgrs_1",
"occurence": 1
}
}
}
}