Table A
Item amount year
A 100 2011
A 110 2011
A 125 2012
A 25 2012
B 54 2013
C 55 2020
D 68 2022
Table B
item year desc
A 2011 xxx
A 2011 xxx
A 2012 xxx
B 2011 xxx
B 2012 xxx
B 2013 xxx
B 2014 xxx
C 2020 xxx
D 2022 xxx
Table C
item year desc Total
A 2011 xxx 220
A 2011 xxx 0
A 2012 xxx 150
B 2011 xxx 0
B 2012 xxx 0
B 2013 xxx 54
B 2014 xxx 0
C 2020 xxx 55
D 2022 xxx 68
These are example tables that represent dataframes that I'm going to create from seperate excel sheets.
Basically there's a many to many relationship going on and I want to be able to create a combined sheet that will roll up the "amount" total (from Table A) for each year of the item (Table B). I don't want to duplicate the amounts for multiple items with the same year as you see in table A, item A has two 2011 entries.
What I'd like to get is a result like Table C below
I'm currently trying to use pandas merge but not having much luck. Can anyone lend a hand?
CodePudding user response:
IIUC, you can use a merge
and post-process to remove the duplicates per year:
out = (dfB
.merge(dfA.rename(columns={'Item': 'item'})
.groupby(['item', 'year'], as_index=False).sum(), how='left')
.assign(amount=lambda d: d['amount']
.mask(d.groupby('year').cumcount().gt(0), 0)
.fillna(0)
)
)
output:
item year desc amount
0 A 2011 xxx 210.0
1 A 2011 xxx 0.0
2 A 2012 xxx 150.0
3 B 2011 xxx 0.0
4 B 2012 xxx 0.0
5 B 2013 xxx 54.0
6 B 2014 xxx 0.0
7 C 2020 xxx 55.0
8 D 2022 xxx 68.0