I'm trying to calculate a conditional sum that involves a lookup in another dataframe.
import pandas as pd
first = pd.DataFrame([{"a": "aaa", "b": 2, "c": "bla", "d": 1}, {"a": "bbb", "b": 3, "c": "bla", "d": 1}, {"a": "aaa", "b": 4, "c": "bla", "d": 1}, {"a": "ccc", "b": 11, "c": "bla", "d": 1}, {"a": "bbb", "b": 23, "c": "bla", "d": 1}])
second = pd.DataFrame([{"a": "aaa", "val": 111}, {"a": "bbb", "val": 222}, {"a": "ccc", "val": 333}, {"a": "ddd", "val": 444}])
print(first)
print(second)
The two DataFrames are
a b c d
0 aaa 2 bla 1
1 bbb 3 bla 1
2 aaa 4 bla 1
3 ccc 11 bla 1
4 bbb 23 bla 1
and
a val
0 aaa 111
1 bbb 222
2 ccc 333
3 ddd 444
I want to append a column in second
that has the sum of column b
in first
in which first.a
matches the corresponding second.a
. The expected result is:
a val result
0 aaa 111 6
1 bbb 222 26
2 ccc 333 11
3 ddd 444 0
Note that this is a minimal example and I'd ideally see a generalizable solution that uses lambda or other functions and not a specific hack that works with this specific example.
CodePudding user response:
You can use pandas.DataFrame.groupby
then use pandas.DataFrame.merge
on the result of groupby.
g = first.groupby('a')['b'].sum().rename('result')
result = second.merge(g, on='a', how='left').fillna(0)
print(result)
Output:
a val result
0 aaa 111 6.0
1 bbb 222 26.0
2 ccc 333 11.0
3 ddd 444 0.0
CodePudding user response:
This should probably work. (came late...)
temp = first[['a','b']].groupby('a').sum().rename({'b':'result'}, axis = 1)
df = pd.merge(second, temp, left_on='a', right_index=True, how ='outer').fillna(0)