JSON list flatten to dataframe as multiple columns with prefix-CodePudding

I have a json with some nested/array items like the one below I'm looking at flattening it before saving it into a csv

[{'SKU':'SKU1','name':'test name 1',
    'ItemSalesPrices':[{'SourceNumber': 'OEM', 'AssetNumber': 'TEST1A', 'UnitPrice': 1600}, {'SourceNumber': 'RRP', 'AssetNumber': 'TEST1B', 'UnitPrice': 1500}],
},
{'SKU':'SKU2','name':'test name 2',
    'ItemSalesPrices':[{'SourceNumber': 'RRP', 'AssetNumber': 'TEST2', 'UnitPrice': 1500}],
}
]

I have attempted with the good solution here flattern nested JSON and retain columns (or Panda json_normalize) but got no where so I'm hoping to get some tips from the community

SKU	Name	ItemSalesPrices_OEM_UnitPrice	ItemSalesPrices_OEM_AssetNumber	ItemSalesPrices_RRP_UnitPrice	ItemSalesPrices_RRP_AssetNumber
SKU1	test name 1	1600	TEST1A	1500	TEST1B
SKU2	test name 2			1500	TEST2

Thank you

CodePudding user response：

Use json_normalize:

first = ['SKU','name']
df = pd.json_normalize(L,'ItemSalesPrices', first)
print (df)
  SourceNumber AssetNumber  UnitPrice    SKU         name
0          OEM      TEST1A       1600  TEST1  test name 1
1          RRP      TEST1B       1500  TEST1  test name 1
2          RRP       TEST2       1500  TEST2  test name 2

Then you can pivoting values - if numeric use sum, if strings use join:

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x)


df1 = (df.pivot_table(index=first, 
                      columns='SourceNumber', 
                      aggfunc=f))
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')

df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
    SKU         name AssetNumber_OEM AssetNumber_RRP  UnitPrice_OEM  \
0  SKU1  test name 1          TEST1A          TEST1B         1600.0   
1  SKU2  test name 2             NaN           TEST2            NaN   

   UnitPrice_RRP  
0         1500.0  
1         1500.0