I have the following toy dataframe named df
:
df = pd.DataFrame({'foo' : ['red', 'red', 'red', 'blue', 'blue', 'blue', 'green', 'green', 'green'],
'bar' : [10, -5, 7, 14, 20, 3, 40, -100, 75]})
foo bar
red 10
red -5
red 7
blue 14
blue -20
blue 3
green 40
green -100
green 75
I would like to sum the absolute values of the bar
column using a groupby()
of the foo
column. Here is my attempt:
df['baz'] = df.groupby('foo').bar.apply(lambda x: x.abs().sum())
I would expect to see the following:
foo bar baz
red 10 22
red -5 22
red 7 22
blue 14 37
blue -20 37
blue 3 37
green 40 215
green -100 215
green 75 215
Unfortunately, I'm getting NaN
in the newly created baz
column.
Why is this approach resulting in NaN
values? Separately (and notwithstanding the bug in my code), is there a more elegant (i.e. more Pythonic) way to do this?
Thanks!
CodePudding user response:
pandas
aligns on index
automatically, so your code does not work because the index
of your groupby
result is different than the index of your original dataframe. Consequently, it returns nan
.
Having said that, you can set the index to be the 'foo' column, using set_index()
, and retry your code:
df.set_index('foo',inplace=True)
df['baz'] = df.groupby('foo').bar.apply(lambda x: x.abs().sum())
print(df)
bar baz
foo
red 10 22
red -5 22
red 7 22
blue 14 37
blue 20 37
blue 3 37
green 40 215
green -100 215
green 75 215
A more pythonic way, as you say I think would be to use transform
, instead of setting the index
and distort your original dataframe. Therefore, my recommendation would be keep your dataframe as it is, and change your code to:
df['baz'] = df.groupby('foo').bar.transform(lambda x: x.abs().sum())
print(df)
foo bar baz
0 red 10 22
1 red -5 22
2 red 7 22
3 blue 14 37
4 blue 20 37
5 blue 3 37
6 green 40 215
7 green -100 215
8 green 75 215
CodePudding user response:
Try with
df['baz'] = df.bar.abs().groupby(df['foo']).transform('sum')