Hi I am attempting to reshape this json data within a dataframe using pandas.
id categories
1 3ee877e0 [{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]
Expected outcome
id entity_def_id permalink uuid value
3ee877e0 category blockchain 1fea6201 Blockchain
3ee877e0 category cryptocurrency bd082f4d Cryptocurrency
3ee877e0 category loyalty-programs 4a45af54 Loyalty Programs
3ee877e0 category marketplace-772d 772da8fe Marketplace
3ee877e0 category software c08b5441 Software
Sorry for not posting my attempts at doing so, but I am new to python, and already know how to do it in mongodb and dataiku, just want to know of a way to do so using python.
CodePudding user response:
You can try explode
categories
columns and then convert dictionaries in categories
column to multiple columns
out = (df.explode('categories', ignore_index=True)
.pipe(lambda df: df.join(pd.DataFrame(df.pop('categories').values.tolist()))))
print(out)
id entity_def_id permalink uuid value
0 3ee877e0 category blockchain 1fea6201 Blockchain
1 3ee877e0 category cryptocurrency bd082f4d Cryptocurrency
2 3ee877e0 category loyalty-programs 4a45af54 Loyalty Programs
3 3ee877e0 category marketplace-772d 772da8fe Marketplace
4 3ee877e0 category software c08b5441 Software
CodePudding user response:
You could take the list of dicts in categories
, pass it to DataFrame()
as-is, and then insert your id column using insert
import pandas as pd
current_df = pd.DataFrame({"id": "3ee877e0","categories":[[{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]]})
id_ = current_df.loc[:,"id"].values[0]
categories = current_df.loc[:,"categories"].values[0]
new_df = pd.DataFrame(categories )
new_df.insert(0,"id", id_ )
result
id entity_def_id permalink uuid value
0 3ee877e0 category blockchain 1fea6201 Blockchain
1 3ee877e0 category cryptocurrency bd082f4d Cryptocurrency
2 3ee877e0 category loyalty-programs 4a45af54 Loyalty Programs
3 3ee877e0 category marketplace-772d 772da8fe Marketplace
4 3ee877e0 category software c08b5441 Software