I have a Pandas dataframe looking like this:
buyer_id car color year
john ferrari yellow 2022
eric ferrari red 2022
john mercedes black 1990
victoria audi yellow 2017
I would like to create a new column (list of jsons in each row.
Create a column 'identical' with a list in each row:
One element in the list if only one buyer is found in 'buyer_id':
[{'car':..., 'color':..., 'year': ...}]
If same buyer on several rows in 'buyer_id'
[ {'car':'ferrari', 'color': 'yellow ', 'year': 2022}, {'car':'mercedes', 'color': 'black', 'year': 1990} ]
Expected output:
buyer_id car color year identical
john ferrari yellow 2022 [{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
eric ferrari red 2022 [{'car':'ferrari', 'color': 'red', 'year': 2022}]
john mercedes black 1990 [[{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
victoria audi yellow 2017 [{'car':'audi', 'color': 'yellow', 'year': 2017}]
I don't know how to do this with Pandas and if it is possible.
CodePudding user response:
You could use GroupBy.apply
and to_json
with the orient="records"
parameter:
s = (df.groupby('buyer_id')
.apply(lambda g: g.drop('buyer_id', axis=1)
.to_json(orient='records'))
)
df2 = df.merge(s.rename('identical'), left_on='buyer_id', right_index=True)
or in place:
s = (df.set_index('buyer_id')
.groupby(level='buyer_id')
.apply(lambda g: g.to_json(orient='records'))
)
df['identical'] = df['buyer_id'].map(s)
output:
buyer_id car color year identical
0 john ferrari yellow 2022 [{"car":"ferrari","color":"yellow","year":2022},{"car":"mercedes","color":"black","year":1990}]
1 eric ferrari red 2022 [{"car":"ferrari","color":"red","year":2022}]
2 john mercedes black 1990 [{"car":"ferrari","color":"yellow","year":2022},{"car":"mercedes","color":"black","year":1990}]
3 victoria audi yellow 2017 [{"car":"audi","color":"yellow","year":2017}]
CodePudding user response:
Try:
to_dict = lambda x: x.to_dict('records')
df['identical'] = df['buyer_id'].map(df.set_index('buyer_id') \
.groupby('buyer_id').apply(to_dict))
print(df)
# Output
buyer_id car color year identical
0 john ferrari yellow 2022 [{'car': 'ferrari', 'color': 'yellow', 'year': 2022}, {'car': 'mercedes', 'color': 'black', 'year': 1990}]
1 eric ferrari red 2022 [{'car': 'ferrari', 'color': 'red', 'year': 2022}]
2 john mercedes black 1990 [{'car': 'ferrari', 'color': 'yellow', 'year': 2022}, {'car': 'mercedes', 'color': 'black', 'year': 1990}]
3 victoria audi yellow 2017 [{'car': 'audi', 'color': 'yellow', 'year': 2017}]
To export your column as JSON, you can use:
>>> df['identical'].to_json(orient='records', indent=2)
[
[
{
"car":"ferrari",
"color":"yellow",
"year":2022
},
{
"car":"mercedes",
"color":"black",
"year":1990
}
],
[
{
"car":"ferrari",
"color":"red",
"year":2022
}
],
[
{
"car":"ferrari",
"color":"yellow",
"year":2022
},
{
"car":"mercedes",
"color":"black",
"year":1990
}
],
[
{
"car":"audi",
"color":"yellow",
"year":2017
}
]
]