I read a JSON and make some transformations to it in a Dataframe and at the end I need a new JSON but in a different format as the source. the transforemd Dataframe looks like this
operation_system version provider AppID
0 Windows 2008 Microsoft APP01
1 Windows 2008 Microsoft APP04
2 Windows 2012 Microsoft APP01
3 Windows 2016 Microsoft APP01
4 Windows 2019 Microsoft APP01
5 Windows 2008 Microsoft APP02
6 Windows 2016 Microsoft APP02
7 Windows 2016 Microsoft APP03
8 Windows 2016 Microsoft APP04
9 Windows 2012 Microsoft APP02
10 Windows 2008 Microsoft APP03
11 Windows 2019 Microsoft APP03
12 Windows 2019 Microsoft APP03
13 Linux 7.5 RedHat APP02
14 Linux 7.6 RedHat APP03
15 Linux 7.8 RedHat APP03
16 Linux 7.7 RedHat APP03
17 Linux 7.5 RedHat APP01
18 Linux 7.6 RedHat APP04
19 Linux 7.8 RedHat APP01
20 Linux 7.7 RedHat APP04
I then try to generate my JSON with the groupby method but don't get the result I want
df = pd.DataFrame(
columns=("operation_system", "version", "provider","AppID"),
data=[("Windows", "2008", "Microsoft","APP01"),("Windows", "2008", "Microsoft","APP04"),("Windows", "2012", "Microsoft","APP01"),("Windows", "2016", "Microsoft","APP01"),("Windows", "2019", "Microsoft","APP01"),
("Windows", "2008", "Microsoft","APP02"),("Windows", "2016", "Microsoft","APP02"),("Windows", "2016", "Microsoft","APP03"),("Windows", "2016", "Microsoft","APP04"),
("Windows", "2012", "Microsoft","APP02"),("Windows", "2008", "Microsoft","APP03"),("Windows", "2019", "Microsoft","APP03"),("Windows", "2019", "Microsoft","APP03"),
("Linux", "7.5", "RedHat","APP02"),("Linux", "7.6", "RedHat","APP03"),("Linux", "7.8", "RedHat","APP03"),("Linux", "7.7", "RedHat","APP03"),
("Linux", "7.5", "RedHat","APP01"),("Linux", "7.6", "RedHat","APP04"),("Linux", "7.8", "RedHat","APP01"),("Linux", "7.7", "RedHat","APP04")]
)
compact_df = (df.groupby(['provider','operation_system','version'])
.apply(lambda x: x[['AppID']].to_dict(orient='list'))
.reset_index()
.to_json(orient='records', indent=2)
)
what I get is this:
[
{
"provider":"Microsoft",
"operation_system":"Windows",
"version":"2008",
"0":{
"AppID":[
"APP01",
"APP04",
"APP02",
"APP03"
]
}
},
{
"provider":"Microsoft",
"operation_system":"Windows",
"version":"2012",
"0":{
"AppID":[
"APP01",
"APP02"
]
}
},
...
but what I want is the following (it would be ok if the node "0":{} is still there but if you can remove it I would prefere that)
[
{
"type": "ITComponent",
"data": {
"provider": "Microsoft",
"operation_system": "Windows",
"version": "2008",
"AppID": [
"APP01",
"APP04",
"APP02",
"APP03"
]
}
},
{
"type": "ITComponent",
"data": {
"provider": "Microsoft",
"operation_system": "Windows",
"version": "2012",
"AppID": [
"APP01",
"APP02"
]
}
},
{
"type": "ITComponent",
"data": {
"provider": "Microsoft",
"operation_system": "Windows",
"version": "2016",
"AppID": [
"APP01",
"APP02",
"APP03",
"APP04"
]
}
}
]
CodePudding user response:
Try:
import json
data = df.groupby(['provider','operation_system','version'])['AppID'] \
.apply(list).reset_index().to_dict(orient='records')
out = json.dumps([{'type': 'ITComponent', 'data': d} for d in data], indent=2)
Output:
>>> print(out)
[
{
"type": "ITComponent",
"data": {
"provider": "Microsoft",
"operation_system": "Windows",
"version": "2008",
"AppID": [
"APP01",
"APP04",
"APP02",
"APP03"
]
}
},
{
"type": "ITComponent",
"data": {
"provider": "Microsoft",
"operation_system": "Windows",
"version": "2012",
"AppID": [
"APP01",
"APP02"
]
}
},
...
]