Get first and last occurrence of duplicate value-CodePudding

I have a dataframe like this

index    col1     col2    col3    col4
  0      11/20    11/26   abc     35
  1      11/21    11/24   xxx     30
  2      11/22    11/27   abc     20

Here col3 has a same value (duplicated based on value). I sum() col4 from rows based on the col3 value.

So in this case I do

df = df.groupby(['col3'])[['col4']].sum()

But with this approach I get

index    col3    col4
  0      abc     55
  1      xxx     30

I would like to be able to preserve first occurrence of duplicated value's col1(11/20) and last occurrence of col2(11/27), so the final product would looks like

index    col1     col2    col3    col4
  0      11/20    11/27   abc     55
  1      11/21    11/24   xxx     30

CodePudding user response：

One way using pandas.DataFrame.groupby.agg:

new_df = df.groupby("col3", as_index=False).agg({"col1": "first",
                                                 "col2": "last",
                                                 "col4": "sum"})
print(new_df)

Output:

  col3   col1   col2  col4
0  abc  11/20  11/27    55
1  xxx  11/21  11/24    30

CodePudding user response：

Pandas provides first and last aggregations.