I am working on a scenario of converting excel to nested json with group by which is to extend to the header as well as the items. Tried as below:
Able to apply transformation rules using pandas:
df['Header'] = df[['A','B']].to_dict('records')
df['Item'] = df[['A', 'C', 'D'].to_dict('records')
By this, I am able to separate the records into separate data frames. Applying below:
data_groupedby = data.groupby(['A', 'B']).agg(list).reset_index()
result = data_groupedby['A','B','Item'].to_json(orient='records')
This gives me the required json with header as well as further drill down of items as a nested deep structure. With groupby, I am able to group fields of header but not able to apply the group by to the respective items, and its not grouping correctly. Any idea as how we can achieve it.
Example DS: Excel:
A B C D
100 Test1 XX10 L
100 Test1 XX10 L
100 Test1 XX20 L
101 Test2 XX10 L
101 Test2 XX20 L
101 Test2 XX20 L
Current output:
[
{
"A": 100,
"B": "Test1",
"Item": [
{
"A": 100,
"C": "XX10",
"D": "L"
},
{
"A": 100,
"C": "XX10",
"D": "L"
},
{
"A": 100,
"C": "XX20",
"D": "L"
}
]
},
{
"A": 101,
"B": "Test2",
"Item": [
{
"A": 101,
"C": "XX10",
"D": "L"
},
{
"A": 101,
"C": "XX20",
"D": "L"
},
{
"A": 101,
"C": "XX20",
"D": "L"
}
]
}
]
If you look at the Array Items, Same values are not grouped by and are repeated.
Thanks TC
CodePudding user response:
You can drop_duplicates
and then groupby
, then apply the to_dict
transformation for columns C
and D
, and then clean up with reset_index
and rename
.
(data.drop_duplicates()
.groupby(["A", "B"])
.apply(lambda x: x[["C", "D"]].to_dict("records"))
.to_frame()
.reset_index()
.rename(columns={0: "Item"})
.to_dict("records"))
Output:
[{'A': 100,
'B': 'Test1',
'Item': [{'C': 'XX10', 'D': 'L'}, {'C': 'XX20', 'D': 'L'}]},
{'A': 101,
'B': 'Test2',
'Item': [{'C': 'XX10', 'D': 'L'}, {'C': 'XX20', 'D': 'L'}]}]