I am trying to convert a dataframe that has similar naming convention into a single json
format column.
Sample Data:
import pandas as pd
df = pd.DataFrame({'id' : 1,
'userName' : 'john',
'productlist0.name' : 'shoe',
'productlist0.price' : 45.89,
'productlist0.brand' : 'nike',
'productlist1.name' : 'jeans',
'productlist1.price' : 19.45,
'productlist1.brand' : 'howes',
'productlist2.name' : 'watch',
'productlist2.price' : 60.0,
'productlist2.brand' : 'fossil'
}, index = [0])
So we have bunch of columns starting with productlist
and that share the column names after the period.
I need to covert these columns into below json
format:
df1 = pd.DataFrame(
{'id' : 1,
'userName' : 'john',
'productlist' : '''[{'name' : 'shoe', 'price' : 45.89, 'brand' : 'nike'},
{'name' : 'jeans', 'price' : 19.45, 'brand' : 'howes'},
{'name' : 'watch', 'price' : 60.0, 'brand' : 'fossil'}]'''
}, index = [0]
)
df1
id userName productlist
0 1 john [{'name' : 'shoe', 'price' : 45.89, 'brand' : 'nike'}, {'name' : 'jeans', 'price' : 19.45, 'brand' : 'howes'}, {'name' : 'watch', 'price' : 60.0, 'brand' : 'fossil'}]
I tried to use the stack
approach and got here:
df.filter(regex = '^productlist').rename(columns = lambda x : re.sub(r'productlist\d\.', '', x)).stack().reset_index().\
groupby(['level_0', 'level_1'])[0].agg(dict)
level_0 level_1
0 brand {2: 'nike', 5: 'howes', 8: 'fossil'}
name {0: 'shoe', 3: 'jeans', 6: 'watch'}
price {1: 45.89, 4: 19.45, 7: 60.0}
Name: 0, dtype: object
But I am not sure how to proceed from here. Could someone please help me on this.
CodePudding user response:
You can do it like:
df1 = df.filter(regex="^productlist\d .").T
df1.index = pd.MultiIndex.from_tuples([(a[0], a[1]) for a in df1.index.str.split(".")])
product_values = df1.unstack().droplevel(0, axis=1).to_dict("records")
df1 = pd.concat(
[
df[["id", "userName"]],
pd.DataFrame({"productlist": [product_values]}, index=[0]),
],
axis=1,
)
First get productlist
columns. Then reindex with (productlist, key)
format and unstack and get a dict in the form of "records" out of it. That will be your list of dicts. Use that to create a new dataframe with column "productlist" and then concat it to original df without productlist\d .
columns
print(df1):
id userName productlist
0 1 john [{'brand': 'nike', 'name': 'shoe', 'price': 45...