I have two separate dataframes (df1
and df2
) with similar columns and I am trying to divide one column in df1
by another column in df2
.
dict1 = { 'sex': {6: 'SEX_M',
7: 'SEX_M',
8: 'SEX_M',
9: 'SEX_M',
10: 'SEX_M',
11: 'SEX_M',
12: 'SEX_F',
13: 'SEX_F',
14: 'SEX_F',
15: 'SEX_F',
16: 'SEX_F',
17: 'SEX_F'},
'classif1': {6: 'AGE_YTHADULT_YGE15',
7: 'AGE_YTHADULT_YGE15',
8: 'AGE_YTHADULT_YGE15',
9: 'AGE_YTHADULT_YGE15',
10: 'AGE_YTHADULT_YGE15',
11: 'AGE_YTHADULT_YGE15',
12: 'AGE_YTHADULT_YGE15',
13: 'AGE_YTHADULT_YGE15',
14: 'AGE_YTHADULT_YGE15',
15: 'AGE_YTHADULT_YGE15',
16: 'AGE_YTHADULT_YGE15',
17: 'AGE_YTHADULT_YGE15'},
'classif2': {6: 'ECO_SECTOR_TOTAL',
7: 'ECO_SECTOR_AGR',
8: 'ECO_SECTOR_NAG',
9: 'ECO_SECTOR_IND',
10: 'ECO_SECTOR_SER',
11: 'ECO_SECTOR_X',
12: 'ECO_SECTOR_TOTAL',
13: 'ECO_SECTOR_AGR',
14: 'ECO_SECTOR_NAG',
15: 'ECO_SECTOR_IND',
16: 'ECO_SECTOR_SER',
17: 'ECO_SECTOR_X'},
'obs_value': {6: 4935.407,
7: 2476.535,
8: 2458.872,
9: 685.795,
10: 1763.5,
11: 9.577,
12: 2551.157,
13: 1950.196,
14: 600.961,
15: 245.793,
16: 343.926,
17: 11.242}}
df1 = pd.DataFrame(dict1)
dict2 = {'sex': {0: 'SEX_T',
1: 'SEX_T',
2: 'SEX_T',
3: 'SEX_T',
4: 'SEX_T',
5: 'SEX_T'},
'classif1': {0: 'AGE_YTHADULT_YGE15',
1: 'AGE_YTHADULT_YGE15',
2: 'AGE_YTHADULT_YGE15',
3: 'AGE_YTHADULT_YGE15',
4: 'AGE_YTHADULT_YGE15',
5: 'AGE_YTHADULT_YGE15'},
'classif2': {0: 'ECO_SECTOR_TOTAL',
1: 'ECO_SECTOR_AGR',
2: 'ECO_SECTOR_NAG',
3: 'ECO_SECTOR_IND',
4: 'ECO_SECTOR_SER',
5: 'ECO_SECTOR_X'},
'obs_value': {0: 7486.564,
1: 4426.731,
2: 3059.833,
3: 931.588,
4: 2107.426,
5: 20.819}}
df2 = pd.DataFrame(dict2)
I am trying to create a new column in the first dataframe (df1
) called Percentage_of_total
by dividing df1['obs_value']
by df2['obs_value']
. Except for the column sex
and obs_value
in both dfs, the values are identical in the other columns. I tried creating a common index and just dividing the columns I'm interested in, but I get the following error:
ValueError: cannot handle a non-unique multi-index!
My code is below:
df1 = df1.set_index(['classif1','classif2'])
df2 = df2.set_index(['classif1','classif2'])
df1['Percentage_of_total'] = df1['obs_value']/ df2['obs_value']
CodePudding user response:
You can use a left merge
to align the values, then divide as numpy array:
cols = ['classif1', 'classif2']
df1['Percentage_of_total'] = (df1['obs_value']
.div(df1[cols].merge(df2, on=cols, how='left')
['obs_value'].to_numpy())
)
output:
sex classif1 classif2 obs_value Percentage_of_total
6 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_TOTAL 4935.407 0.659235
7 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_AGR 2476.535 0.559450
8 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_NAG 2458.872 0.803597
9 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_IND 685.795 0.736157
10 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_SER 1763.500 0.836803
11 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_X 9.577 0.460012
12 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_TOTAL 2551.157 0.340765
13 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_AGR 1950.196 0.440550
14 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_NAG 600.961 0.196403
15 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_IND 245.793 0.263843
16 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_SER 343.926 0.163197
17 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_X 11.242 0.539988
CodePudding user response:
As you have already set the index, you can simply join the results -
df1.join(df1['obs_value']/ df2['obs_value'], rsuffix='_Percentage_of_total')
Output
classif1 classif2
AGE_YTHADULT_YGE15 ECO_SECTOR_AGR SEX_M 2476.535 0.559450
ECO_SECTOR_AGR SEX_M 2476.535 0.440550
ECO_SECTOR_AGR SEX_F 1950.196 0.559450
ECO_SECTOR_AGR SEX_F 1950.196 0.440550
ECO_SECTOR_IND SEX_M 685.795 0.736157
ECO_SECTOR_IND SEX_M 685.795 0.263843
ECO_SECTOR_IND SEX_F 245.793 0.736157
ECO_SECTOR_IND SEX_F 245.793 0.263843
ECO_SECTOR_NAG SEX_M 2458.872 0.803597
ECO_SECTOR_NAG SEX_M 2458.872 0.196403
ECO_SECTOR_NAG SEX_F 600.961 0.803597
ECO_SECTOR_NAG SEX_F 600.961 0.196403
ECO_SECTOR_SER SEX_M 1763.500 0.836803
ECO_SECTOR_SER SEX_M 1763.500 0.163197
ECO_SECTOR_SER SEX_F 343.926 0.836803
ECO_SECTOR_SER SEX_F 343.926 0.163197
ECO_SECTOR_TOTAL SEX_M 4935.407 0.659235
ECO_SECTOR_TOTAL SEX_M 4935.407 0.340765
ECO_SECTOR_TOTAL SEX_F 2551.157 0.659235
ECO_SECTOR_TOTAL SEX_F 2551.157 0.340765
ECO_SECTOR_X SEX_M 9.577 0.460012
ECO_SECTOR_X SEX_M 9.577 0.539988
ECO_SECTOR_X SEX_F 11.242 0.460012
CodePudding user response:
You can either use a merge as stated in an answer above. Another option can be applying this function along the rows of df1:
def func(x):
x['pct'] = x.obs_value / np.sum(df2.query('[email protected] & [email protected]').obs_value)
return x
df1.apply(func, axis=1)
What happens here is you iterate over the rows and add a percentage column to the series.
The output is as follows:
sex classif1 classif2 obs_value pct
6 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_TOTAL 4935.407 0.659235
7 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_AGR 2476.535 0.559450
8 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_NAG 2458.872 0.803597
9 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_IND 685.795 0.736157
10 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_SER 1763.500 0.836803
11 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_X 9.577 0.460012
12 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_TOTAL 2551.157 0.340765
13 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_AGR 1950.196 0.440550
14 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_NAG 600.961 0.196403
15 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_IND 245.793 0.263843
16 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_SER 343.926 0.163197
17 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_X 11.242 0.539988
CodePudding user response:
@mozway's solution is definitely cleaner and more general. However, if you - for some reason - don't want to use merge
or join
, you could also use the following logic for your specific data frame:
import numpy as np
df1['percentage'] = df1.obs_value.values / np.array(df2.obs_value.to_list() * 2)
df1
sex classif1 classif2 obs_value percentage
6 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_TOTAL 4935.407 0.659235
7 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_AGR 2476.535 0.559450
8 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_NAG 2458.872 0.803597
9 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_IND 685.795 0.736157
10 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_SER 1763.500 0.836803
11 SEX_M AGE_YTHADULT_YGE15 ECO_SECTOR_X 9.577 0.460012
12 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_TOTAL 2551.157 0.340765
13 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_AGR 1950.196 0.440550
14 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_NAG 600.961 0.196403
15 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_IND 245.793 0.263843
16 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_SER 343.926 0.163197
17 SEX_F AGE_YTHADULT_YGE15 ECO_SECTOR_X 11.242 0.539988