I have a json with some nested/array items like the one below I'm looking at flattening it before saving it into a csv
[{'SKU':'SKU1','name':'test name 1',
'ItemSalesPrices':[{'SourceNumber': 'OEM', 'AssetNumber': 'TEST1A', 'UnitPrice': 1600}, {'SourceNumber': 'RRP', 'AssetNumber': 'TEST1B', 'UnitPrice': 1500}],
},
{'SKU':'SKU2','name':'test name 2',
'ItemSalesPrices':[{'SourceNumber': 'RRP', 'AssetNumber': 'TEST2', 'UnitPrice': 1500}],
}
]
I have attempted with the good solution here flattern nested JSON and retain columns (or Panda json_normalize) but got no where so I'm hoping to get some tips from the community
SKU | Name | ItemSalesPrices_OEM_UnitPrice | ItemSalesPrices_OEM_AssetNumber | ItemSalesPrices_RRP_UnitPrice | ItemSalesPrices_RRP_AssetNumber |
---|---|---|---|---|---|
SKU1 | test name 1 | 1600 | TEST1A | 1500 | TEST1B |
SKU2 | test name 2 | 1500 | TEST2 |
Thank you
CodePudding user response:
Use json_normalize
:
first = ['SKU','name']
df = pd.json_normalize(L,'ItemSalesPrices', first)
print (df)
SourceNumber AssetNumber UnitPrice SKU name
0 OEM TEST1A 1600 TEST1 test name 1
1 RRP TEST1B 1500 TEST1 test name 1
2 RRP TEST2 1500 TEST2 test name 2
Then you can pivoting values - if numeric use sum
, if strings use join
:
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x)
df1 = (df.pivot_table(index=first,
columns='SourceNumber',
aggfunc=f))
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')
df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
SKU name AssetNumber_OEM AssetNumber_RRP UnitPrice_OEM \
0 SKU1 test name 1 TEST1A TEST1B 1600.0
1 SKU2 test name 2 NaN TEST2 NaN
UnitPrice_RRP
0 1500.0
1 1500.0