pandas: aggregate rows by creating dictionary key-value pairs based on a column-CodePudding

Let's say I have a pandas dataframe:

| id1 | id2 | attr1 | combo_id | perm_id |
| --- | --- | --- | --- | --- |
| 1 | 2 | [9606] | [1,2] | AB |
| 2 | 1 | [9606] | [1,2] | BA |
| 3 | 4 | [9606] | [3,4] | AB |
| 4 | 3 | [9606] | [3,4] | BA |

I'd like to aggregate rows with the same combo_id together, and store information from both rows using the perm_id of that row. So the resulting dataframe would look like:

| attr1 | combo_id |
| --- | --- |
| {'AB':[9606], 'BA': [9606]} | [1,2] |
| {'AB':[9606], 'BA': [9606]} | [3,4] |

How would I use groupby and aggregate functions for these operations?

I tried converting attribute1 to a dict using perm_id.

df['attr1'] = df.apply(lambda x: {x['perm_id']: x['attr1']})

Then I planned to use something to combine dictionaries in the same group. df.groupby(['combo_id']).agg({ 'attr1': lambda x: {x**}) But this resulted in KeyError: perm_id

Any suggestions?

CodePudding user response：

Try:

from ast import literal_eval

x = (
    df.groupby(df["combo_id"].astype(str))
    .apply(lambda x: dict(zip(x["perm_id"], x["attr1"])))
    .reset_index(name="attr1")
)

# convert combo_id back to list (if needed)
x["combo_id"] = x["combo_id"].apply(literal_eval)

print(x)

Prints:

  combo_id                         attr1
0   [1, 2]  {'AB': [9606], 'BA': [9606]}
1   [3, 4]  {'AB': [9606], 'BA': [9606]}