Home > Software design >  Pandas: How to combine rows based on multiple columns
Pandas: How to combine rows based on multiple columns

Time:10-16

Say I have a dataframe like this

import pandas as pd 

test = [
    {1: 434, 2: 343, 3: [592]},
    {1: 434, 2: 343, 3: [192]},
    {1: 534, 2: 743, 3: [392]},
]

df = pd.DataFrame(test)
df

1   2   3
0   434 343 [592]
1   434 343 [192]
2   534 743 [392]

I want to combine rows where columns 2 and 3 are the same, and add up the lists in column 3.

Desired result

1   2   3
0   434 343 [592, 192]
2   534 743 [392]

Attempt so far

I believe group by could be used, and then some sort of aggregation function following it to combine the lists. So something like

df.groupby([1, 2]).aggregate(aggregation_functions)

Though I'm not sure what to put as the aggregation_functions

CodePudding user response:

You are nearly there. Try:

res = df.groupby([1, 2], as_index=False)[3].sum()

print(res)

     1    2           3
0  434  343  [592, 192]
1  534  743       [392]

If you want to keep the first index for each group [e.g. 0, 2], you can use:

df[3] = df.groupby([1, 2])[3].transform(sum)
df.drop_duplicates(subset=[1, 2], inplace=True)

print(df)

     1    2           3
0  434  343  [592, 192]
2  534  743       [392]
  • Related