Home > database >  python: create nested JSON with pandas and add parent node to it
python: create nested JSON with pandas and add parent node to it

Time:02-25

I read a JSON and make some transformations to it in a Dataframe and at the end I need a new JSON but in a different format as the source. the transforemd Dataframe looks like this

operation_system version   provider  AppID
0           Windows    2008  Microsoft  APP01
1           Windows    2008  Microsoft  APP04
2           Windows    2012  Microsoft  APP01
3           Windows    2016  Microsoft  APP01
4           Windows    2019  Microsoft  APP01
5           Windows    2008  Microsoft  APP02
6           Windows    2016  Microsoft  APP02
7           Windows    2016  Microsoft  APP03
8           Windows    2016  Microsoft  APP04
9           Windows    2012  Microsoft  APP02
10          Windows    2008  Microsoft  APP03
11          Windows    2019  Microsoft  APP03
12          Windows    2019  Microsoft  APP03
13            Linux     7.5     RedHat  APP02
14            Linux     7.6     RedHat  APP03
15            Linux     7.8     RedHat  APP03
16            Linux     7.7     RedHat  APP03
17            Linux     7.5     RedHat  APP01
18            Linux     7.6     RedHat  APP04
19            Linux     7.8     RedHat  APP01
20            Linux     7.7     RedHat  APP04

I then try to generate my JSON with the groupby method but don't get the result I want

df = pd.DataFrame(
columns=("operation_system", "version", "provider","AppID"),
data=[("Windows", "2008", "Microsoft","APP01"),("Windows", "2008", "Microsoft","APP04"),("Windows", "2012", "Microsoft","APP01"),("Windows", "2016", "Microsoft","APP01"),("Windows", "2019", "Microsoft","APP01"),
("Windows", "2008", "Microsoft","APP02"),("Windows", "2016", "Microsoft","APP02"),("Windows", "2016", "Microsoft","APP03"),("Windows", "2016", "Microsoft","APP04"),
("Windows", "2012", "Microsoft","APP02"),("Windows", "2008", "Microsoft","APP03"),("Windows", "2019", "Microsoft","APP03"),("Windows", "2019", "Microsoft","APP03"),
("Linux", "7.5", "RedHat","APP02"),("Linux", "7.6", "RedHat","APP03"),("Linux", "7.8", "RedHat","APP03"),("Linux", "7.7", "RedHat","APP03"),
("Linux", "7.5", "RedHat","APP01"),("Linux", "7.6", "RedHat","APP04"),("Linux", "7.8", "RedHat","APP01"),("Linux", "7.7", "RedHat","APP04")]

)

compact_df =  (df.groupby(['provider','operation_system','version'])
.apply(lambda x: x[['AppID']].to_dict(orient='list'))
.reset_index()
.to_json(orient='records', indent=2)
)

what I get is this:

    [
  {
    "provider":"Microsoft",
    "operation_system":"Windows",
    "version":"2008",
    "0":{
      "AppID":[
        "APP01",
        "APP04",
        "APP02",
        "APP03"
      ]
    }
  },
  {
    "provider":"Microsoft",
    "operation_system":"Windows",
    "version":"2012",
    "0":{
      "AppID":[
        "APP01",
        "APP02"
      ]
    }
  },
...

but what I want is the following (it would be ok if the node "0":{} is still there but if you can remove it I would prefere that)

[
    {
        "type": "ITComponent",
        "data": {
            "provider": "Microsoft",
            "operation_system": "Windows",
            "version": "2008",
            "AppID": [
                "APP01",
                "APP04",
                "APP02",
                "APP03"
            ]
        }
    },
    {
        "type": "ITComponent",
        "data": {
            "provider": "Microsoft",
            "operation_system": "Windows",
            "version": "2012",
            "AppID": [
                "APP01",
                "APP02"
            ]
        }
    },
    {
        "type": "ITComponent",
        "data": {
            "provider": "Microsoft",
            "operation_system": "Windows",
            "version": "2016",
            "AppID": [
                "APP01",
                "APP02",
                "APP03",
                "APP04"
            ]
        }
    }
]

CodePudding user response:

Try:

import json

data = df.groupby(['provider','operation_system','version'])['AppID'] \
         .apply(list).reset_index().to_dict(orient='records')

out = json.dumps([{'type': 'ITComponent', 'data': d} for d in data], indent=2)

Output:

>>> print(out)
[
  {
    "type": "ITComponent",
    "data": {
      "provider": "Microsoft",
      "operation_system": "Windows",
      "version": "2008",
      "AppID": [
        "APP01",
        "APP04",
        "APP02",
        "APP03"
      ]
    }
  },
  {
    "type": "ITComponent",
    "data": {
      "provider": "Microsoft",
      "operation_system": "Windows",
      "version": "2012",
      "AppID": [
        "APP01",
        "APP02"
      ]
    }
  },
  ...
]
  • Related