I have been trying to normalize a JSON file to succesfully convert it to a CSV later, but i just cant get to work.
Im trying normalize it with pandas pd.json_normalize but it seems to be a very nested JSON.
This is the json structure:
{
"documentReport": {
"documentId": "62f0f78a5e1428137492451c",
"flowName": "sdsf",
"flowStateName": "sent",
"userName": "20304050607",
"shortUrl": "vHppw8dm",
"uploadDateUtc": "08/08/2022 08:46:18",
"appId": "tucarpeta.com",
"claro": {
"ctl_Id": "15836272",
"numberSds": "00441426TAE",
"numberOT": "12047839",
"entity": "193811",
"entityType": "432",
"channel": "1",
"presencial": 1,
"customer": {
"documentNumber": "23763407",
"name": "PAULA RAQUEL",
"lastName": "VELASQUEZ",
"cuit": null,
"sex": "F",
"email": "[email protected]"
}
}
},
"claroEventKpiReportModel": {
"deviceData": {
"operativeSystem": "Android 10 undefined",
"model": "SAMSUNG SM-J400M",
"device": "mobile",
"browser": "Samsung Internet for Android"
},
"eventData": {
"eventDateUtc": "08/08/2022 08:49:50",
"eventName": "IDOC_DOCUMENT_OPENED_BY_AVA",
"userName": "system"
},
"result": "Ok"
}
}
and so it goes on with the next record.
The output i need is the following:
documentId | flowName | flowStateName | userName | uploadDateUtc | numberSds |
---|---|---|---|---|---|
62f0f78a5e1428137492451c | sdsf | sent | 20304050607 | 08/08/2022 08:46:18 | 00441426TAE |
And so with other records.
Can anyone help me how to use json_normalize to do so? Or how to get it done anyway. Thanks
CodePudding user response:
After converting literal null
value that is not a string to np.nan
, use
pd.DataFrame.from_dict(dict).T
CodePudding user response:
pd.json_normalize
can't parse every JSON schema out there. You can preprocess the data before turning it to a data frame:
df = pd.DataFrame([{
**{
key: record["documentReport"][key]
for key in ["documentId", "flowName", "flowStateName", "userName", "uploadDateUtc"]
},
"numberSds": record["documentReport"]["claro"]["numberSds"]
} for record in json_array])