Python numpy dataframe conditional operation (e.g. sum) across two dataframes-CodePudding

I'm trying to calculate a conditional sum that involves a lookup in another dataframe.

import pandas as pd
first = pd.DataFrame([{"a": "aaa", "b": 2, "c": "bla", "d": 1}, {"a": "bbb", "b": 3, "c": "bla", "d": 1}, {"a": "aaa", "b": 4, "c": "bla", "d": 1}, {"a": "ccc", "b": 11, "c": "bla", "d": 1}, {"a": "bbb", "b": 23, "c": "bla", "d": 1}])
second = pd.DataFrame([{"a": "aaa", "val": 111}, {"a": "bbb", "val": 222}, {"a": "ccc", "val": 333}, {"a": "ddd", "val": 444}])

print(first)
print(second)

The two DataFrames are

     a   b    c  d
0  aaa   2  bla  1
1  bbb   3  bla  1
2  aaa   4  bla  1
3  ccc  11  bla  1
4  bbb  23  bla  1

and

     a  val
0  aaa  111
1  bbb  222
2  ccc  333
3  ddd  444

I want to append a column in second that has the sum of column b in first in which first.a matches the corresponding second.a. The expected result is:

     a  val result
0  aaa  111      6
1  bbb  222     26
2  ccc  333     11
3  ddd  444      0

Note that this is a minimal example and I'd ideally see a generalizable solution that uses lambda or other functions and not a specific hack that works with this specific example.

CodePudding user response：

You can use pandas.DataFrame.groupby then use pandas.DataFrame.merge on the result of groupby.

g = first.groupby('a')['b'].sum().rename('result')
result = second.merge(g, on='a', how='left').fillna(0)
print(result)

Output:

     a  val  result
0  aaa  111     6.0
1  bbb  222    26.0
2  ccc  333    11.0
3  ddd  444     0.0

CodePudding user response：

This should probably work. (came late...)

temp = first[['a','b']].groupby('a').sum().rename({'b':'result'}, axis = 1)
df = pd.merge(second, temp, left_on='a', right_index=True, how ='outer').fillna(0)