Home > Mobile >  How do I normalize deeply nested JSON results that has a lot of different collections?
How do I normalize deeply nested JSON results that has a lot of different collections?

Time:10-26

What I have is a result from monday.com via a graphql query. The data comes back with a lot of different collections such as a dictionary of a list of dictionaries... eg.

{
  "data": {
    "boards": [
      {
        "name": "board name",
        "groups": [
                    {"title": "group title 1"},
                    {"title": "group title 2"},
                    {"title": "group title 3"},
                    {"title": "group title 4"}
                  ]
       }
     ]
   },
  "account_id": "0000000"
}

and I want to make a pandas dataframe like

     group.board    group.title
0    'board_name'   'group title 1'
1    'board_name'   'group title 2'
2    'board_name'   'group title 3'
3    'board_name'   'group title 4'

I've tried

pd.json_normalize(json_data, 'boards')

But I keep getting KeyError: 'boards'

CodePudding user response:

Your json dict is invalid (missing a closing brace, so I fixed it). I would do it like this. Here we are going into the dict at json_data["data"]["boards"] because that's where the data is, using "groups" as the records key, and using the field "name" as one of the metadata fields.

import pandas as pd

json_data = {
    "data": {
        "boards": [
            {
                "name": "board name",
                "groups": [
                    { "title": "group title 1" },
                    { "title": "group title 2" },
                    { "title": "group title 3" },
                    { "title": "group title 4" }
                ]
            },
        ]
    },
    "account_id": "0000000"
}

pd.json_normalize(json_data["data"]["boards"], "groups", ["name"])

Output:


    title           name
0   group title 1   board name
1   group title 2   board name
2   group title 3   board name
3   group title 4   board name

CodePudding user response:

You get a keyerror because json_data has no key boards. json_data["data"] does, but you still wouldn't get your expected result.

You need to do pass json_data["data"]["boards"] as the list of dicts, ask for "groups" as the record_path, and ["name"] as the meta path:

>>> df = pd.json_normalize(json_data["data"]["boards"], "groups", ["name"])
           title        name
0  group title 1  board name
1  group title 2  board name
2  group title 3  board name
3  group title 4  board name

Then, you can rename the columns:

>>> df.columns = ["group.title", "group.board"]

     group.title group.board
0  group title 1  board name
1  group title 2  board name
2  group title 3  board name
3  group title 4  board name
  • Related