I have a dataframe
like this
index col1 col2 col3 col4
0 11/20 11/26 abc 35
1 11/21 11/24 xxx 30
2 11/22 11/27 abc 20
Here col3
has a same value (duplicated based on value).
I sum()
col4
from rows based on the col3
value.
So in this case I do
df = df.groupby(['col3'])[['col4']].sum()
But with this approach I get
index col3 col4
0 abc 55
1 xxx 30
I would like to be able to preserve first occurrence of duplicated value's col1
(11/20) and last occurrence of col2
(11/27), so the final product would looks like
index col1 col2 col3 col4
0 11/20 11/27 abc 55
1 11/21 11/24 xxx 30
CodePudding user response:
One way using pandas.DataFrame.groupby.agg
:
new_df = df.groupby("col3", as_index=False).agg({"col1": "first",
"col2": "last",
"col4": "sum"})
print(new_df)
Output:
col3 col1 col2 col4
0 abc 11/20 11/27 55
1 xxx 11/21 11/24 30
CodePudding user response: