I have a data load in nested json format and I want to get a nested dataframe
{
"page_count": 21,
"page_number": 1,
"page_size": 300,
"total_records": 6128,
"registrants": [
{
"id": "23lnTNqyQ3qkthfghjgkk",
"first_name": "HUGO",
"last_name": "MACHA ILLEN",
"email": "[email protected]",
"address": "",
"city": "",
"country": "",
"zip": "",
"state": "",
"phone": "",
"industry": "",
"org": "",
"job_title": "",
"purchasing_time_frame": "",
"role_in_purchase_process": "",
"no_of_employees": "",
"comments": "",
"custom_questions": [
{
"title": "Departamento/ Región",
"value": ""
},
{
"title": "Género",
"value": "Masculino"
},
{
"title": "Edad",
"value": "De 35 a 55 años"
},
{
"title": "Nivel de estudio",
"value": "Técnico / Superior"
},
{
"title": "¿Eres cliente de una entidad financiera?",
"value": "Si"
},
{
"title": "¿Tiene una empresa?",
"value": "Si"
}
I use this funtion: pat is the file json
df = pd.json_normalize(json.loads(pat.explode("custom_questions").to_json(orient="records")))
but it didn't work for the output i wanted.
I want get someone like this:
I want the value of the "titles", that is, the keys" to be displayed as the column headers and the value of "value" as the data. As well as the image that is attached
CodePudding user response:
you can use json_normalize:
df = pd.DataFrame(your_json['registrants']).explode('custom_questions').reset_index(drop=True)
df=df.join(pd.json_normalize(df.pop('custom_questions')))
#convert rows to columns. Set index first 17 columns. We will not use theese.
df=df.set_index(df.columns[0:17].to_list())
dfx=df.pivot_table(values='value',columns='title',aggfunc=list).apply(pd.Series.explode).reset_index(drop=True)
#we have duplicate rows. Drop them
df=df.reset_index().drop(['value','title'],axis=1).drop_duplicates().reset_index(drop=True)
df=df.join(dfx)
'''
| | id | first_name | last_name | email | address | city | country | zip | state | phone | industry | org | job_title | purchasing_time_frame | role_in_purchase_process | no_of_employees | comments | Departamento/ Región | Edad | Género | Nivel de estudio | ¿Eres cliente de una entidad financiera? | ¿Tiene una empresa? |
|---:|:----------------------|:-------------|:--------------|:-------------------|:----------|:-------|:----------|:------|:--------|:--------|:-----------|:------|:------------|:------------------------|:---------------------------|:------------------|:-----------|:-----------------------|:----------------|:----------|:-------------------|:-------------------------------------------|:----------------------|
| 0 | 23lnTNqyQ3qkthfghjgkk | HUGO | MACHA ILLEN | [email protected] | | | | | | | | | | | | | | | De 35 a 55 años | Masculino | Técnico / Superior | Si | Si |
'''