Doing group calculations with two separate dataframes in python-CodePudding

I have two pandas dataframes like this:

df1= pd.DataFrame({'sub-group':['2020','2030','2040','2030','2040','2030','2040'], 
                      'group':['a', 'a', 'a', 'b', 'b', 'c', 'c'],
                      'value1':[12,11,41,33,66,22,20]})

      sub-group  group  value1
      2020       a      12
      2030       a      11
      2040       a      41
      2030       b      33
      2040       b      66
      2030       c      22
      2040       c      20

df2= pd.DataFrame({'sub-group':['2020','2030','2040', '2020', '2030','2040','2030','2040'], 
                      'group':['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c'],
                      'value2':[10,20,30,15,45,60,12,36]})

      sub-group group  value2
      2020      a      10
      2030      a      20
      2040      a      30
      2020      b      15
      2030      b      45
      2040      b      60
      2030      c      12
      2040      c      36

I want to find valu1/value2 for each group and sub-group. Note that the number of observations might not match in two dataframes. for example, we have 2020/b in df2 but not in df1. I those cases a nan or 0 would work.

I was thinking that it should be possible with pd.groupby but I don't know how it works with two dataframes. Thanks.

CodePudding user response：

For align groups and sub-group create MultiIndex Series by DataFrame.set_index, then divide by Series.div, last for 3 columns DataFrame add Series.reset_index:

df = (df1.set_index(['group','sub-group'])['value1']
         .div(df2.set_index(['group','sub-group'])['value2'])
         .reset_index(name='out'))
print (df)
  group sub-group       out
0     a      2020  1.200000
1     a      2030  0.550000
2     a      2040  1.366667
3     b      2020       NaN
4     b      2030  0.733333
5     b      2040  1.100000
6     c      2030  1.833333
7     c      2040  0.555556

CodePudding user response：

You can use a merge to align the Series:

df2.merge(df1, on=['group', 'sub-group'], how='outer').eval('new = value1/value2')

NB. I'm using eval here for a concise way to perform the computation, but you can also retrieve "value1"/"value2" and do whatever you want. Another way is df2.merge(df1, on=['group', 'sub-group'], how='outer').assign(new=lambda d: d['value1'].div(d['value2'])).

Output:

  sub-group group  value2  value1       new
0      2020     a      10    12.0  1.200000
1      2030     a      20    11.0  0.550000
2      2040     a      30    41.0  1.366667
3      2020     b      15     NaN       NaN
4      2030     b      45    33.0  0.733333
5      2040     b      60    66.0  1.100000
6      2030     c      12    22.0  1.833333
7      2040     c      36    20.0  0.555556

If you want a different order of the rows, just swap df1/df2:

df1.merge(df2, on=['group', 'sub-group'], how='outer').eval('new = value1/value2')

Output:

  sub-group group  value1  value2       new
0      2020     a    12.0      10  1.200000
1      2030     a    11.0      20  0.550000
2      2040     a    41.0      30  1.366667
3      2030     b    33.0      45  0.733333
4      2040     b    66.0      60  1.100000
5      2030     c    22.0      12  1.833333
6      2040     c    20.0      36  0.555556
7      2020     b     NaN      15       NaN

CodePudding user response：

You can just groupby and divide the two dataframes (I put the sum as aggregate method in case you have more rows with the same combination group and sub-group)

final_df = df1.groupby(['group', 'sub-group'])['value1'].sum()/df2.groupby(['group', 'sub-group'])['value2'].sum()

Output:

group  sub-group
a      2020         1.200000
       2030         0.550000
       2040         1.366667
b      2020              NaN
       2030         0.733333
       2040         1.100000
c      2030         1.833333
       2040         0.555556