I have a pandas data frame like:
Id a b c d
x 1 1 1 01/01/2021
x 1 1 0 01/05/2021
y 1 1 1 02/01/2021
y 1 1 1 02/01/2021
Id a b c d
x 2 1 1 01/01/2021
y 1 1 1 02/01/2021
I want to sum the values for the column a b and c for the same Id if and only if the date in column d is different. In fact, in these case in the 2 occurrences x has different dates so I can sum the values, instead d has the same values so I just not take the sum but the first occurrence.
CodePudding user response:
dat[~dat[["d", "Id"]].duplicated()].groupby("Id", as_index=False).agg(
{nm: "first" if nm in ["d", "Id"] else "sum" for nm in dat.columns}
)
Id a b c d
0 x 2 2 1 01/01/2021
1 y 1 1 1 02/01/2021