Home > Net >  reshape nested json data in a dataframe using python to get desired output
reshape nested json data in a dataframe using python to get desired output

Time:08-10

Hi I am attempting to reshape this json data within a dataframe using pandas.

      id        categories
1     3ee877e0  [{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]

Expected outcome

id        entity_def_id  permalink         uuid        value
3ee877e0  category       blockchain        1fea6201    Blockchain
3ee877e0  category       cryptocurrency    bd082f4d    Cryptocurrency
3ee877e0  category       loyalty-programs  4a45af54    Loyalty Programs
3ee877e0  category       marketplace-772d  772da8fe    Marketplace
3ee877e0  category       software          c08b5441    Software

Sorry for not posting my attempts at doing so, but I am new to python, and already know how to do it in mongodb and dataiku, just want to know of a way to do so using python.

CodePudding user response:

You can try explode categories columns and then convert dictionaries in categories column to multiple columns

out = (df.explode('categories', ignore_index=True)
       .pipe(lambda df: df.join(pd.DataFrame(df.pop('categories').values.tolist()))))
print(out)

         id entity_def_id         permalink      uuid             value
0  3ee877e0      category        blockchain  1fea6201        Blockchain
1  3ee877e0      category    cryptocurrency  bd082f4d    Cryptocurrency
2  3ee877e0      category  loyalty-programs  4a45af54  Loyalty Programs
3  3ee877e0      category  marketplace-772d  772da8fe       Marketplace
4  3ee877e0      category          software  c08b5441          Software

CodePudding user response:

You could take the list of dicts in categories, pass it to DataFrame() as-is, and then insert your id column using insert

import pandas as pd

current_df = pd.DataFrame({"id": "3ee877e0","categories":[[{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]]})

id_ = current_df.loc[:,"id"].values[0]
categories = current_df.loc[:,"categories"].values[0]

new_df = pd.DataFrame(categories )
new_df.insert(0,"id", id_ )

result

         id entity_def_id         permalink      uuid             value
0  3ee877e0      category        blockchain  1fea6201        Blockchain
1  3ee877e0      category    cryptocurrency  bd082f4d    Cryptocurrency
2  3ee877e0      category  loyalty-programs  4a45af54  Loyalty Programs
3  3ee877e0      category  marketplace-772d  772da8fe       Marketplace
4  3ee877e0      category          software  c08b5441          Software
  • Related