I have the following df
:
A B C D
0 foo a 1200 300
0 foo a 700 300
0 foo b 1000 300
1 bar b 270 70
1 bar a 350 70
2 abc c 270 300
2 abc a 350 300
I want to display the sum of values in column D
grouped by column B
, but I do not want to sum the values in column B
for a single value in column A
. That is, column D
has only one value per value in column A
.
foo
will only ever have the value 300
and bar
will only have the value 70
in column D
. The values in this column are just repeated because I have repeated indexes.
I want to print something like (no need to show formatting, I just need to output the correct sums):
a: 300 (from foo) 300 (from foo) 70 (from bar) = 670
b: 300 (from foo) 70 (from bar) = 370
c: 300 (from abc)
That is, values in column D
should not be summed together if the value in column A
is the same among them.
CodePudding user response:
You could use pd.unique()
after the groupby and then sum those values up.
df.groupby('B')['D'].apply(lambda x: sum(pd.unique(x)))
B
a 370
b 370
Name: D, dtype: int64