I have two pandas dataframes like this:
df1= pd.DataFrame({'sub-group':['2020','2030','2040','2030','2040','2030','2040'],
'group':['a', 'a', 'a', 'b', 'b', 'c', 'c'],
'value1':[12,11,41,33,66,22,20]})
sub-group group value1
2020 a 12
2030 a 11
2040 a 41
2030 b 33
2040 b 66
2030 c 22
2040 c 20
df2= pd.DataFrame({'sub-group':['2020','2030','2040', '2020', '2030','2040','2030','2040'],
'group':['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c'],
'value2':[10,20,30,15,45,60,12,36]})
sub-group group value2
2020 a 10
2030 a 20
2040 a 30
2020 b 15
2030 b 45
2040 b 60
2030 c 12
2040 c 36
I want to find valu1/value2
for each group and sub-group. Note that the number of observations might not match in two dataframes. for example, we have 2020/b in df2 but not in df1. I those cases a nan or 0 would work.
I was thinking that it should be possible with pd.groupby
but I don't know how it works with two dataframes.
Thanks.
CodePudding user response:
For align groups
and sub-group
create MultiIndex Series
by DataFrame.set_index
, then divide by Series.div
, last for 3 columns DataFrame add Series.reset_index
:
df = (df1.set_index(['group','sub-group'])['value1']
.div(df2.set_index(['group','sub-group'])['value2'])
.reset_index(name='out'))
print (df)
group sub-group out
0 a 2020 1.200000
1 a 2030 0.550000
2 a 2040 1.366667
3 b 2020 NaN
4 b 2030 0.733333
5 b 2040 1.100000
6 c 2030 1.833333
7 c 2040 0.555556
CodePudding user response:
You can use a merge
to align the Series:
df2.merge(df1, on=['group', 'sub-group'], how='outer').eval('new = value1/value2')
NB. I'm using eval
here for a concise way to perform the computation, but you can also retrieve "value1"/"value2" and do whatever you want. Another way is df2.merge(df1, on=['group', 'sub-group'], how='outer').assign(new=lambda d: d['value1'].div(d['value2']))
.
Output:
sub-group group value2 value1 new
0 2020 a 10 12.0 1.200000
1 2030 a 20 11.0 0.550000
2 2040 a 30 41.0 1.366667
3 2020 b 15 NaN NaN
4 2030 b 45 33.0 0.733333
5 2040 b 60 66.0 1.100000
6 2030 c 12 22.0 1.833333
7 2040 c 36 20.0 0.555556
If you want a different order of the rows, just swap df1
/df2
:
df1.merge(df2, on=['group', 'sub-group'], how='outer').eval('new = value1/value2')
Output:
sub-group group value1 value2 new
0 2020 a 12.0 10 1.200000
1 2030 a 11.0 20 0.550000
2 2040 a 41.0 30 1.366667
3 2030 b 33.0 45 0.733333
4 2040 b 66.0 60 1.100000
5 2030 c 22.0 12 1.833333
6 2040 c 20.0 36 0.555556
7 2020 b NaN 15 NaN
CodePudding user response:
You can just groupby and divide the two dataframes (I put the sum as aggregate method in case you have more rows with the same combination group
and sub-group
)
final_df = df1.groupby(['group', 'sub-group'])['value1'].sum()/df2.groupby(['group', 'sub-group'])['value2'].sum()
Output:
group sub-group
a 2020 1.200000
2030 0.550000
2040 1.366667
b 2020 NaN
2030 0.733333
2040 1.100000
c 2030 1.833333
2040 0.555556