Convert JSON to CSV Nested JSON with Python Pandas-CodePudding

I have a python script which can take a JSON file and pulls out specific columns I want with the Pandas json_normalize function. But I have a nested JSON set of values within the JSON that I am trying to pull out but cant get the code to properly get those values.

Below is the JSON value. The top tier is "cardEditions" within that tier is "cardDetails". I want to grab some of the displayName and value details from this nested json and put them into the csv with the cardEditions and the editionNo value.

Looking for the output to be in a CSV pipe delimited like the following with the displayValues as the headers from the nested json.

editionNo	Name	Edition	Position
666	Matt Hu	1st Edition	Center Field

{
    "cardEditions": [{
        "editionNo": 666,
        "id": 1111,
        "cardDetails": [{
                "valueType": "Text",
                "displayValueType": "Text",
                "displayName": "Name",
                "value": "Matt Hu"
            },
            {
                "valueType": "Text",
                "displayValueType": "Text",
                "displayName": "Edition",
                "value": "1st Edition"
            },
            {
                "valueType": "Text",
                "displayValueType": "Text",
                "displayName": "Position",
                "value": "Center Field"
            }

        ],
        "cardStatus": "NA"
    }]
}

CodePudding user response：

import pandas as pd

d = {
    "cardEditions": [{
        "editionNo": 666,
        "id": 1111,
        "cardDetails": [{
                "valueType": "Text",
                "displayValueType": "Text",
                "displayName": "Name",
                "value": "Matt Hu"
            },
            {
                "valueType": "Text",
                "displayValueType": "Text",
                "displayName": "Edition",
                "value": "1st Edition"
            },
            {
                "valueType": "Text",
                "displayValueType": "Text",
                "displayName": "Position",
                "value": "Center Field"
            }

        ],
        "cardStatus": "NA"
    }]
}

df = pd.DataFrame(columns=['editionNo', 'Name', 'Edition', 'Position'])
for i, edition in enumerate(d['cardEditions']):
    no = edition['editionNo']
    vals = [details['value'] for details in edition['cardDetails']]
    df.loc[i, :] = (no, *vals)
print(df)

prints

index	editionNo	Name	Edition	Position
0	666	Matt Hu	1st Edition	Center Field

CodePudding user response：

Here is another solution with json_normalize and pivot.

# where "j" is your json data
out = (
    pd.json_normalize(json.loads(j)['cardEditions'], record_path=['cardDetails'], meta='editionNo')
    .drop(['valueType', 'displayValueType'],axis=1)
    .pivot(index='editionNo', columns='displayName', values='value')
    .rename_axis(columns=None)
    .reset_index()
)
print(out)

Output:

   editionNo      Edition     Name      Position
0        666  1st Edition  Matt Hu  Center Field

If somebody knows how to use json_normalize with record_path of which we don't want all fields, I'd really like to know. I had to drop to columns from the record_path because I don't know how to skip them in the first place.