Home > front end >  Include JSON section numbers as columns in a df while converting JSON to DF
Include JSON section numbers as columns in a df while converting JSON to DF

Time:06-21

I have a nested JSON as follows:

{
"group": {
    "groupname": "grp1",
    "groupid": 1,
    "city": "London"
},
"persons": {
    "0": {
        "name": "john",
        "age": 12,
        "gender": "M",
        "groupid": 1
    },
    "1": {
        "name": "maat",
        "age": 15,
        "gender": "M",
        "groupid": 1
    },
    "2": {
        "name": "chrissle",
        "age": 10,
        "gender": "F",
        "groupid": 1
    },
    "3": {
        "name": "stacy",
        "age": 11,
        "gender": "F",
        "groupid": 1
    },
    "4": {
        "name": "mark",
        "age": 12,
        "gender": "M",
        "groupid": 1
    },
    "5": {
        "name": "job",
        "age": 12,
        "gender": "M",
        "groupid": 1
    }
},
"group": {
    "groupname": "grp1",
    "groupid": 2,
    "city": "NewYork"
},
"persons": {
    "0": {
        "name": "will",
        "age": 12,
        "gender": "M",
        "groupid": 2
    },
    "1": {
        "name": "phil",
        "age": 15,
        "gender": "M",
        "groupid": 2
    },
    "2": {
        "name": "winnie",
        "age": 10,
        "gender": "F",
        "groupid": 2
    }
}
}

I want to separate the two sections group and persons into two df respectively.

For the second df persons I want to include the section numbers as columns as follows:

id    name    age    gender    groupid
0     john     12      M         1
1     maat     15      M         1
2     chrissle 10      F         1

I have loaded the JSON as a list of dict and converted it into a df:

data= pd.DataFrame.from_dict(data)

I can then get persons

personsdf= personsdf['persons']

This will however give me a df with one column that has dict rows for every persons section.

I have tried below to unnest the dict rows:

finaldf= pd.DataFrame()
for index, row in personsdf.iterrows():
    row_data=row['personsdf']
    row_data = pd.DataFrame.from_dict(row_data)
    row_data = row_data.T
    finaldf= finaldf.append(row_data, ignore_index=True)

But then I get all the columns except the section number which gets lost. Is there a better way to approach this?

CodePudding user response:

If I understand you correctly you want to create two dataframes: one for groups and the second for persons:

data = [
    {
        "group": {"groupname": "grp1", "groupid": 1, "city": "London"},
        "persons": {
            "0": {"name": "john", "age": 12, "gender": "M", "groupid": 1},
            "1": {"name": "maat", "age": 15, "gender": "M", "groupid": 1},
            "2": {"name": "chrissle", "age": 10, "gender": "F", "groupid": 1},
            "3": {"name": "stacy", "age": 11, "gender": "F", "groupid": 1},
            "4": {"name": "mark", "age": 12, "gender": "M", "groupid": 1},
            "5": {"name": "job", "age": 12, "gender": "M", "groupid": 1},
        },
    },
    {
        "group": {"groupname": "grp1", "groupid": 2, "city": "NewYork"},
        "persons": {
            "0": {"name": "will", "age": 12, "gender": "M", "groupid": 2},
            "1": {"name": "phil", "age": 15, "gender": "M", "groupid": 2},
            "2": {"name": "winnie", "age": 10, "gender": "F", "groupid": 2},
        },
    },
]

df1 = pd.DataFrame([d["group"] for d in data])
df2 = pd.DataFrame(
    [{"id": k, **v} for d in data for k, v in d["persons"].items()]
)
print(df1)
print(df2)

Prints:

  groupname  groupid     city
0      grp1        1   London
1      grp1        2  NewYork

  id      name  age gender  groupid
0  0      john   12      M        1
1  1      maat   15      M        1
2  2  chrissle   10      F        1
3  3     stacy   11      F        1
4  4      mark   12      M        1
5  5       job   12      M        1
6  0      will   12      M        2
7  1      phil   15      M        2
8  2    winnie   10      F        2
  • Related