Home > Enterprise >  JSON list flatten to dataframe as multiple columns with prefix
JSON list flatten to dataframe as multiple columns with prefix

Time:12-10

I have a json with some nested/array items like the one below I'm looking at flattening it before saving it into a csv

[{'SKU':'SKU1','name':'test name 1',
    'ItemSalesPrices':[{'SourceNumber': 'OEM', 'AssetNumber': 'TEST1A', 'UnitPrice': 1600}, {'SourceNumber': 'RRP', 'AssetNumber': 'TEST1B', 'UnitPrice': 1500}],
},
{'SKU':'SKU2','name':'test name 2',
    'ItemSalesPrices':[{'SourceNumber': 'RRP', 'AssetNumber': 'TEST2', 'UnitPrice': 1500}],
}
]

I have attempted with the good solution here flattern nested JSON and retain columns (or Panda json_normalize) but got no where so I'm hoping to get some tips from the community

SKU Name ItemSalesPrices_OEM_UnitPrice ItemSalesPrices_OEM_AssetNumber ItemSalesPrices_RRP_UnitPrice ItemSalesPrices_RRP_AssetNumber
SKU1 test name 1 1600 TEST1A 1500 TEST1B
SKU2 test name 2 1500 TEST2

Thank you

CodePudding user response:

Use json_normalize:

first = ['SKU','name']
df = pd.json_normalize(L,'ItemSalesPrices', first)
print (df)
  SourceNumber AssetNumber  UnitPrice    SKU         name
0          OEM      TEST1A       1600  TEST1  test name 1
1          RRP      TEST1B       1500  TEST1  test name 1
2          RRP       TEST2       1500  TEST2  test name 2

Then you can pivoting values - if numeric use sum, if strings use join:

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x)


df1 = (df.pivot_table(index=first, 
                      columns='SourceNumber', 
                      aggfunc=f))
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')

df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
    SKU         name AssetNumber_OEM AssetNumber_RRP  UnitPrice_OEM  \
0  SKU1  test name 1          TEST1A          TEST1B         1600.0   
1  SKU2  test name 2             NaN           TEST2            NaN   

   UnitPrice_RRP  
0         1500.0  
1         1500.0  
  • Related