I have spent days trying to solve this problem. This was the most effective way I could find how to iterate this object: I get the result, but when I use pandas the DataFrame structure is not correct. So I was exploring the idea of creating the columns without pandas, but I don't get the expected result.
Any idea how I could improve this code?
Sample data:
{
"financeCodeId": "fc-5599",
"financeData": [
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-02-28",
"calcTotalReturn": -0.0424735070051586,
"financeDataAttributes": [
{
"attributeId": "a-256",
"value": "12.032791372796499"
},
{
"attributeId": "a-257",
"value": "9.975964795996589"
},
{
"attributeId": "a-258",
"value": "4.719852927810759"
},
{
"attributeId": "a-259",
"value": "4.18144793134823"
},
]
}
Code:
items = []
for i in response_finance['results']:
items.append(i.get('financeCodeId'))
for j in i['financeData']:
if 'financeDataAttributes' not in j:
items.append({j.get('date'), j.get('calcTotalReturn')})
else:
for k in j['financeDataAttributes']:
items.append({k.get('attributeId'), k.get('value')})
df = pd.DataFrame.from_records(items)
flat = pd.json_normalize(json.loads(df.to_json(orient="records")))
Output
Expected:
CodePudding user response:
Since you already go through your json manually, you don't need json_normalize
. You can create your own flattened record (I fixed the formatting of your json, check if it matches your real data):
response_finance = [{
"financeCodeId": "fc-5599",
"financeData": [
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-01-30",
"calcTotalReturn": 0.022425456852
},
{
"date": "2022-02-28",
"calcTotalReturn": -0.0424735070051586,
"financeDataAttributes": [
{
"attributeId": "a-256",
"value": "12.032791372796499"
},
{
"attributeId": "a-257",
"value": "9.975964795996589"
},
{
"attributeId": "a-258",
"value": "4.719852927810759"
},
{
"attributeId": "a-259",
"value": "4.18144793134823"
},
]
}
]
}
]
items = []
for i in response_finance:
for j in i['financeData']:
if 'financeDataAttributes' not in j:
item = {}
item['financeCodeId'] = i.get('financeCodeId')
item.update({'date': j.get('date'), 'calcTotalReturn': j.get('calcTotalReturn')})
items.append(item)
else:
for k in j['financeDataAttributes']:
item = {}
item['financeCodeId'] = i.get('financeCodeId')
item.update({'date': j.get('date'), 'calcTotalReturn': j.get('calcTotalReturn')})
item.update({'attributeId': k.get('attributeId'), 'value': k.get('value')})
items.append(item)
df = pd.DataFrame.from_records(items)
print(df)
Output:
financeCodeId date calcTotalReturn attributeId value
0 fc-5599 2022-01-30 0.022425 NaN NaN
1 fc-5599 2022-01-30 0.022425 NaN NaN
2 fc-5599 2022-01-30 0.022425 NaN NaN
3 fc-5599 2022-01-30 0.022425 NaN NaN
4 fc-5599 2022-01-30 0.022425 NaN NaN
5 fc-5599 2022-02-28 -0.042474 a-256 12.032791372796499
6 fc-5599 2022-02-28 -0.042474 a-257 9.975964795996589
7 fc-5599 2022-02-28 -0.042474 a-258 4.719852927810759
8 fc-5599 2022-02-28 -0.042474 a-259 4.18144793134823