Home > Mobile >  Nested json containing arrays to dataframe
Nested json containing arrays to dataframe

Time:04-28

I have the following json:

{
    "meta": {
        "collection": 0
    },
    "series": {
        "default": {
            "data": {
                "columns": [
                    {
                        "columnName": "A",
                        "columnType": "STRING"
                    },
                    {
                        "columnName": "B",
                        "columnType": "STRING"
                    }
                ],
                "rows": [
                    [
                        {
                            "columnName": "A",
                            "value": "X1"
                        },
                        {
                            "columnName": "B",
                            "value": "Y1"
                        }
                    ],
                    [
                        {
                            "columnName": "A",
                            "value": "X2"
                        },
                        {
                            "columnName": "B",
                            "value": "Y2"
                        }
                    ]
                ]
            }
        }
    }
}

I'm trying to parse this json into a dataframe which should look like this:

    A    B
   --------
    X1   Y1
    X2   Y2

Here's what I have tried so far:

import pandas as pd
results = {"meta":{"collection":0},"series":{"default":{"data":{"columns":[{"columnName":"A","columnType":"STRING"},{"columnName":"B","columnType":"STRING"}],"rows":[[{"columnName":"A","value":"X1"},{"columnName":"B","value":"Y1"}],[{"columnName":"A","value":"X2"},{"columnName":"B","value":"Y2"}]]}}}}
s = results["series"]["default"]["data"]
df = pd.json_normalize(s, record_path=["rows"])

The problem is the columns in the df contain the json array, and not the values

                                    0                                   1
0  {'columnName': 'A', 'value': 'X1'}  {'columnName': 'B', 'value': 'Y1'}
1  {'columnName': 'A', 'value': 'X2'}  {'columnName': 'B', 'value': 'Y2'}

Is there anyway to use json_normalize to achieve the result I want, or is parsing through json myself the way to go?

CodePudding user response:

One option is to stack it and create a DataFrame with the stacked Series and pivot it:

out = (pd.DataFrame(my_data['series']['default']['data']['rows'])
       .stack()
       .pipe(lambda x: pd.DataFrame(x.tolist(), index=x.index))
       .reset_index(level=0)
       .pivot('level_0','columnName','value')
       .rename_axis(index=[None], columns=[None]))

An easier (and probably more efficient) solution is to reformat the data in Python using dict.setdefault in a nested loop; then build the DataFrame:

tmp = {}
for lst in my_data['series']['default']['data']['rows']:
    for d in lst:
        tmp.setdefault(d['columnName'], []).append(d['value'])
out = pd.DataFrame(tmp)

Output:

    A   B
0  X1  Y1
1  X2  Y2
  • Related