Home > Back-end >  Converting nested JSON into a flattened DataFrame
Converting nested JSON into a flattened DataFrame

Time:11-22

I have a data load in nested json format and I want to get a nested dataframe

{
    "page_count": 21,
    "page_number": 1,
    "page_size": 300,
    "total_records": 6128,
    "registrants": [
        {
            "id": "23lnTNqyQ3qkthfghjgkk",
            "first_name": "HUGO",
            "last_name": "MACHA   ILLEN",
            "email": "[email protected]",
            "address": "",
            "city": "",
            "country": "",
            "zip": "",
            "state": "",
            "phone": "",
            "industry": "",
            "org": "",
            "job_title": "",
            "purchasing_time_frame": "",
            "role_in_purchase_process": "",
            "no_of_employees": "",
            "comments": "",
            "custom_questions": [
                {
                    "title": "Departamento/ Región",
                    "value": ""
                },
                {
                    "title": "Género",
                    "value": "Masculino"
                },
                {
                    "title": "Edad",
                    "value": "De 35 a 55 años"
                },
                {
                    "title": "Nivel de estudio",
                    "value": "Técnico / Superior"
                },
                {
                    "title": "¿Eres cliente de una entidad financiera?",
                    "value": "Si"
                },
                {
                    "title": "¿Tiene una empresa?",
                    "value": "Si"
                }

I use this funtion: pat is the file json

df = pd.json_normalize(json.loads(pat.explode("custom_questions").to_json(orient="records")))

but it didn't work for the output i wanted.

I want get someone like this:

enter image description here

I want the value of the "titles", that is, the keys" to be displayed as the column headers and the value of "value" as the data. As well as the image that is attached

CodePudding user response:

you can use json_normalize:

df = pd.DataFrame(your_json['registrants']).explode('custom_questions').reset_index(drop=True)
df=df.join(pd.json_normalize(df.pop('custom_questions')))

#convert rows to columns. Set index first 17 columns. We will not use theese.
df=df.set_index(df.columns[0:17].to_list())
dfx=df.pivot_table(values='value',columns='title',aggfunc=list).apply(pd.Series.explode).reset_index(drop=True)

#we have duplicate rows. Drop them
df=df.reset_index().drop(['value','title'],axis=1).drop_duplicates().reset_index(drop=True)
df=df.join(dfx)
'''
|    | id                    | first_name   | last_name     | email              | address   | city   | country   | zip   | state   | phone   | industry   | org   | job_title   | purchasing_time_frame   | role_in_purchase_process   | no_of_employees   | comments   | Departamento/ Región   | Edad            | Género    | Nivel de estudio   | ¿Eres cliente de una entidad financiera?   | ¿Tiene una empresa?   |
|---:|:----------------------|:-------------|:--------------|:-------------------|:----------|:-------|:----------|:------|:--------|:--------|:-----------|:------|:------------|:------------------------|:---------------------------|:------------------|:-----------|:-----------------------|:----------------|:----------|:-------------------|:-------------------------------------------|:----------------------|
|  0 | 23lnTNqyQ3qkthfghjgkk | HUGO         | MACHA   ILLEN | [email protected] |           |        |           |       |         |         |            |       |             |                         |                            |                   |            |                        | De 35 a 55 años | Masculino | Técnico / Superior | Si                                         | Si                    |

'''
  • Related