Let's say I have a pandas dataframe:
| id1 | id2 | attr1 | combo_id | perm_id |
| --- | --- | --- | --- | --- |
| 1 | 2 | [9606] | [1,2] | AB |
| 2 | 1 | [9606] | [1,2] | BA |
| 3 | 4 | [9606] | [3,4] | AB |
| 4 | 3 | [9606] | [3,4] | BA |
I'd like to aggregate rows with the same combo_id together, and store information from both rows using the perm_id of that row. So the resulting dataframe would look like:
| attr1 | combo_id |
| --- | --- |
| {'AB':[9606], 'BA': [9606]} | [1,2] |
| {'AB':[9606], 'BA': [9606]} | [3,4] |
How would I use groupby and aggregate functions for these operations?
I tried converting attribute1 to a dict using perm_id.
df['attr1'] = df.apply(lambda x: {x['perm_id']: x['attr1']})
Then I planned to use something to combine dictionaries in the same group.
df.groupby(['combo_id']).agg({ 'attr1': lambda x: {x**})
But this resulted in KeyError: perm_id
Any suggestions?
CodePudding user response:
Try:
from ast import literal_eval
x = (
df.groupby(df["combo_id"].astype(str))
.apply(lambda x: dict(zip(x["perm_id"], x["attr1"])))
.reset_index(name="attr1")
)
# convert combo_id back to list (if needed)
x["combo_id"] = x["combo_id"].apply(literal_eval)
print(x)
Prints:
combo_id attr1
0 [1, 2] {'AB': [9606], 'BA': [9606]}
1 [3, 4] {'AB': [9606], 'BA': [9606]}