Say I have a dataframe like this
import pandas as pd
test = [
{1: 434, 2: 343, 3: [592]},
{1: 434, 2: 343, 3: [192]},
{1: 534, 2: 743, 3: [392]},
]
df = pd.DataFrame(test)
df
1 2 3
0 434 343 [592]
1 434 343 [192]
2 534 743 [392]
I want to combine rows where columns 2 and 3 are the same, and add up the lists in column 3.
Desired result
1 2 3
0 434 343 [592, 192]
2 534 743 [392]
Attempt so far
I believe group by could be used, and then some sort of aggregation function following it to combine the lists. So something like
df.groupby([1, 2]).aggregate(aggregation_functions)
Though I'm not sure what to put as the aggregation_functions
CodePudding user response:
You are nearly there. Try:
res = df.groupby([1, 2], as_index=False)[3].sum()
print(res)
1 2 3
0 434 343 [592, 192]
1 534 743 [392]
If you want to keep the first index for each group [e.g. 0, 2
], you can use:
df[3] = df.groupby([1, 2])[3].transform(sum)
df.drop_duplicates(subset=[1, 2], inplace=True)
print(df)
1 2 3
0 434 343 [592, 192]
2 534 743 [392]