Home > database >  Divide two separate columns from two separate dataframes using common index
Divide two separate columns from two separate dataframes using common index

Time:06-24

I have two separate dataframes (df1 and df2) with similar columns and I am trying to divide one column in df1 by another column in df2.

dict1 = { 'sex': {6: 'SEX_M',
  7: 'SEX_M',
  8: 'SEX_M',
  9: 'SEX_M',
  10: 'SEX_M',
  11: 'SEX_M',
  12: 'SEX_F',
  13: 'SEX_F',
  14: 'SEX_F',
  15: 'SEX_F',
  16: 'SEX_F',
  17: 'SEX_F'},
 'classif1': {6: 'AGE_YTHADULT_YGE15',
  7: 'AGE_YTHADULT_YGE15',
  8: 'AGE_YTHADULT_YGE15',
  9: 'AGE_YTHADULT_YGE15',
  10: 'AGE_YTHADULT_YGE15',
  11: 'AGE_YTHADULT_YGE15',
  12: 'AGE_YTHADULT_YGE15',
  13: 'AGE_YTHADULT_YGE15',
  14: 'AGE_YTHADULT_YGE15',
  15: 'AGE_YTHADULT_YGE15',
  16: 'AGE_YTHADULT_YGE15',
  17: 'AGE_YTHADULT_YGE15'},
 'classif2': {6: 'ECO_SECTOR_TOTAL',
  7: 'ECO_SECTOR_AGR',
  8: 'ECO_SECTOR_NAG',
  9: 'ECO_SECTOR_IND',
  10: 'ECO_SECTOR_SER',
  11: 'ECO_SECTOR_X',
  12: 'ECO_SECTOR_TOTAL',
  13: 'ECO_SECTOR_AGR',
  14: 'ECO_SECTOR_NAG',
  15: 'ECO_SECTOR_IND',
  16: 'ECO_SECTOR_SER',
  17: 'ECO_SECTOR_X'},
 'obs_value': {6: 4935.407,
  7: 2476.535,
  8: 2458.872,
  9: 685.795,
  10: 1763.5,
  11: 9.577,
  12: 2551.157,
  13: 1950.196,
  14: 600.961,
  15: 245.793,
  16: 343.926,
  17: 11.242}}

df1 = pd.DataFrame(dict1)

dict2 = {'sex': {0: 'SEX_T',
  1: 'SEX_T',
  2: 'SEX_T',
  3: 'SEX_T',
  4: 'SEX_T',
  5: 'SEX_T'},
 'classif1': {0: 'AGE_YTHADULT_YGE15',
  1: 'AGE_YTHADULT_YGE15',
  2: 'AGE_YTHADULT_YGE15',
  3: 'AGE_YTHADULT_YGE15',
  4: 'AGE_YTHADULT_YGE15',
  5: 'AGE_YTHADULT_YGE15'},
 'classif2': {0: 'ECO_SECTOR_TOTAL',
  1: 'ECO_SECTOR_AGR',
  2: 'ECO_SECTOR_NAG',
  3: 'ECO_SECTOR_IND',
  4: 'ECO_SECTOR_SER',
  5: 'ECO_SECTOR_X'},
 'obs_value': {0: 7486.564,
  1: 4426.731,
  2: 3059.833,
  3: 931.588,
  4: 2107.426,
  5: 20.819}}

df2 = pd.DataFrame(dict2)

I am trying to create a new column in the first dataframe (df1) called Percentage_of_total by dividing df1['obs_value'] by df2['obs_value']. Except for the column sex and obs_value in both dfs, the values are identical in the other columns. I tried creating a common index and just dividing the columns I'm interested in, but I get the following error:

ValueError: cannot handle a non-unique multi-index!

My code is below:

df1 = df1.set_index(['classif1','classif2'])
df2 = df2.set_index(['classif1','classif2'])

df1['Percentage_of_total'] = df1['obs_value']/ df2['obs_value']

CodePudding user response:

You can use a left merge to align the values, then divide as numpy array:

cols = ['classif1', 'classif2']
df1['Percentage_of_total'] =  (df1['obs_value']
                               .div(df1[cols].merge(df2, on=cols, how='left')
                                    ['obs_value'].to_numpy())
                              )

output:

      sex            classif1          classif2  obs_value  Percentage_of_total
6   SEX_M  AGE_YTHADULT_YGE15  ECO_SECTOR_TOTAL   4935.407             0.659235
7   SEX_M  AGE_YTHADULT_YGE15    ECO_SECTOR_AGR   2476.535             0.559450
8   SEX_M  AGE_YTHADULT_YGE15    ECO_SECTOR_NAG   2458.872             0.803597
9   SEX_M  AGE_YTHADULT_YGE15    ECO_SECTOR_IND    685.795             0.736157
10  SEX_M  AGE_YTHADULT_YGE15    ECO_SECTOR_SER   1763.500             0.836803
11  SEX_M  AGE_YTHADULT_YGE15      ECO_SECTOR_X      9.577             0.460012
12  SEX_F  AGE_YTHADULT_YGE15  ECO_SECTOR_TOTAL   2551.157             0.340765
13  SEX_F  AGE_YTHADULT_YGE15    ECO_SECTOR_AGR   1950.196             0.440550
14  SEX_F  AGE_YTHADULT_YGE15    ECO_SECTOR_NAG    600.961             0.196403
15  SEX_F  AGE_YTHADULT_YGE15    ECO_SECTOR_IND    245.793             0.263843
16  SEX_F  AGE_YTHADULT_YGE15    ECO_SECTOR_SER    343.926             0.163197
17  SEX_F  AGE_YTHADULT_YGE15      ECO_SECTOR_X     11.242             0.539988

CodePudding user response:

As you have already set the index, you can simply join the results -

df1.join(df1['obs_value']/ df2['obs_value'], rsuffix='_Percentage_of_total')

Output

classif1           classif2                                                        
AGE_YTHADULT_YGE15 ECO_SECTOR_AGR    SEX_M   2476.535                      0.559450
                   ECO_SECTOR_AGR    SEX_M   2476.535                      0.440550
                   ECO_SECTOR_AGR    SEX_F   1950.196                      0.559450
                   ECO_SECTOR_AGR    SEX_F   1950.196                      0.440550
                   ECO_SECTOR_IND    SEX_M    685.795                      0.736157
                   ECO_SECTOR_IND    SEX_M    685.795                      0.263843
                   ECO_SECTOR_IND    SEX_F    245.793                      0.736157
                   ECO_SECTOR_IND    SEX_F    245.793                      0.263843
                   ECO_SECTOR_NAG    SEX_M   2458.872                      0.803597
                   ECO_SECTOR_NAG    SEX_M   2458.872                      0.196403
                   ECO_SECTOR_NAG    SEX_F    600.961                      0.803597
                   ECO_SECTOR_NAG    SEX_F    600.961                      0.196403
                   ECO_SECTOR_SER    SEX_M   1763.500                      0.836803
                   ECO_SECTOR_SER    SEX_M   1763.500                      0.163197
                   ECO_SECTOR_SER    SEX_F    343.926                      0.836803
                   ECO_SECTOR_SER    SEX_F    343.926                      0.163197
                   ECO_SECTOR_TOTAL  SEX_M   4935.407                      0.659235
                   ECO_SECTOR_TOTAL  SEX_M   4935.407                      0.340765
                   ECO_SECTOR_TOTAL  SEX_F   2551.157                      0.659235
                   ECO_SECTOR_TOTAL  SEX_F   2551.157                      0.340765
                   ECO_SECTOR_X      SEX_M      9.577                      0.460012
                   ECO_SECTOR_X      SEX_M      9.577                      0.539988
                   ECO_SECTOR_X      SEX_F     11.242                      0.460012

CodePudding user response:

You can either use a merge as stated in an answer above. Another option can be applying this function along the rows of df1:

def func(x):
    x['pct'] = x.obs_value / np.sum(df2.query('[email protected] & [email protected]').obs_value)
    return x
df1.apply(func, axis=1)

What happens here is you iterate over the rows and add a percentage column to the series.

The output is as follows:

sex classif1    classif2    obs_value   pct
6   SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_TOTAL    4935.407    0.659235
7   SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_AGR  2476.535    0.559450
8   SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_NAG  2458.872    0.803597
9   SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_IND  685.795 0.736157
10  SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_SER  1763.500    0.836803
11  SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_X    9.577   0.460012
12  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_TOTAL    2551.157    0.340765
13  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_AGR  1950.196    0.440550
14  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_NAG  600.961 0.196403
15  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_IND  245.793 0.263843
16  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_SER  343.926 0.163197
17  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_X    11.242  0.539988

CodePudding user response:

@mozway's solution is definitely cleaner and more general. However, if you - for some reason - don't want to use merge or join, you could also use the following logic for your specific data frame:

import numpy as np

df1['percentage'] = df1.obs_value.values / np.array(df2.obs_value.to_list() * 2)
df1

    sex     classif1            classif2          obs_value percentage
6   SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_TOTAL  4935.407  0.659235
7   SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_AGR    2476.535  0.559450
8   SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_NAG    2458.872  0.803597
9   SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_IND    685.795   0.736157
10  SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_SER    1763.500  0.836803
11  SEX_M   AGE_YTHADULT_YGE15  ECO_SECTOR_X      9.577     0.460012
12  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_TOTAL  2551.157  0.340765
13  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_AGR    1950.196  0.440550
14  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_NAG    600.961   0.196403
15  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_IND    245.793   0.263843
16  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_SER    343.926   0.163197
17  SEX_F   AGE_YTHADULT_YGE15  ECO_SECTOR_X      11.242    0.539988
  • Related