I have a dataframe as follows:
lat | long | city | nameDisease | numberCases |
---|---|---|---|---|
0 | 2 | rio | Dengue | 1 |
0 | 2 | rio | Chicungunha | 2 |
1 | 3 | sp | Dengue | 3 |
1 | 3 | sp | COVID | 4 |
I want to aggregate the rows with same (lat,long,city) and generate a json as follows:
[{lat:0,long:2,city:"rio",diseases:[{nameDisease:"Dengue",numberCases:1},{nameDisease:"Chicungunha",numberCases:2}],{lat:1,long:3,city:"sp",diseases:[{nameDisease:"Dengue",numberCases:3},{nameDisease:"COVID",numberCases:4}]]
How can I do this kind of transformation with pandas?
CodePudding user response:
A few to_dict
groupby
calls:
json = df.groupby(cols).apply(lambda g: g.drop(cols, axis=1).to_dict('records')).reset_index().rename({0:'diseases'}, axis=1).to_dict('records')
Output:
>>> json
[{'lat': 0,
'long': 2,
'city': 'rio',
'diseases': [{'nameDisease': 'Dengue', 'numberCases': 1},
{'nameDisease': 'Chicungunha', 'numberCases': 2}]},
{'lat': 1,
'long': 3,
'city': 'sp',
'diseases': [{'nameDisease': 'Dengue', 'numberCases': 3},
{'nameDisease': 'COVID', 'numberCases': 4}]}]
>>> json == expected_output
True