How to do 3 layer flattening nested json to dataframe?-CodePudding

For example the JSON is :

{
    "samples": [
        {
            "sample_id": "A2434",
            "start": "1664729482",
            "end": "1664729482",
            "parts": [
                {
                    "name": "123",
                    "start": "1664736682",
                    "end": "1618688700",
                    "fail": ""
                }
            ]
        }
    ]
}

I want the df and columns like below :

sample_id,start,end,parts.name,parts.start,parts.end,parts.fail

CodePudding user response：

Using json.normalize

df = pd.json_normalize(
    data=data["samples"],
    record_path="parts",
    record_prefix="parts.",
    meta=["sample_id", "start", "end"]
).drop(columns="parts.name")

print(df)

  parts.start   parts.end parts.fail sample_id       start         end
0  1664736682  1618688700                A2434  1664729482  1664729482

CodePudding user response：

Use pd.read_json() to unpack the JSON to a Dataframe. You can then use pd.json_normalize() as required on the generated columns to get the more nested data out.

CodePudding user response：

You can use df.explode to get the separate item in a list on a row, and use agg(pd.Series) to convert key, value pairs in a dictionary to columns and rows, respectively.

df = pd.DataFrame(data['samples'])
df[['parts.start','parts.end','parts.fail']] = df['parts'].explode().agg(pd.Series)
df.drop('parts', axis=1, inplace=True)

Output:

     sample_id  start   end      parts.name parts.start parts.end   parts.fail
0   A2434   1664729482  1664729482   123    1664736682  1618688700