I am trying to add a list of column sums back to my pandas dataframe through assign()
. But I am not so sure how to do it when there's more than one column. What's the best way to do it or any other way to do it in a chain style given I have had other steps before?
data2.assign(data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum'))
rate_water rate_fire rate_wood
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000
orange 2019-01-01 -0.636364 -0.963636 3.000000
melon 2019-01-01 -0.333333 5.666667 27.888889
apple 2020-01-01 -0.100000 7.900000 76.000000
orange 2020-01-01 0.363636 -0.963636 26.500000
melon 2020-01-01 0.166667 6.166667 27.235043
apple 2021-01-01 0.328571 26.261702 84.220779
orange 2021-01-01 0.363636 28.036364 28.683673
melon 2021-01-01 0.611111 39.944444 27.679487
Reproducible:
from pandas import Timestamp
data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}})
rate_water rate_fire rate_wood sum_water sum_fire sum_wood
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000 -1.469697 4.20303 30.888889
orange 2019-01-01 -0.636364 -0.963636 3.000000 -1.469697 4.20303 30.888889
melon 2019-01-01 -0.333333 5.666667 27.888889 -1.469697 4.20303 30.888889
apple 2020-01-01 -0.100000 7.900000 76.000000 0.430303 13.10303 129.735043
orange 2020-01-01 0.363636 -0.963636 26.500000 0.430303 13.10303 129.735043
melon 2020-01-01 0.166667 6.166667 27.235043 0.430303 13.10303 129.735043
apple 2021-01-01 0.328571 26.261702 84.220779 1.303319 94.24251 140.583940
orange 2021-01-01 0.363636 28.036364 28.683673 1.303319 94.24251 140.583940
melon 2021-01-01 0.611111 39.944444 27.679487 1.303319 94.24251 140.583940
CodePudding user response:
Use dictionary comprehension for dict of Series and add it to dataFrame with unpack **
:
from pandas import Timestamp
data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}})
data2.index.names=['id','date']
cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.assign(**{rate_name.replace('rate','sum'):
data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum')
for rate_name in cols})
print (data2)
rate_water rate_fire rate_wood sum_water sum_fire \
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962
orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272
melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234
apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914
orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543
melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629
apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661
orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492
melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847
sum_wood
id date
apple 2019-01-01 0.000000
orange 2019-01-01 0.097122
melon 2019-01-01 0.902878
apple 2020-01-01 0.585809
orange 2020-01-01 0.204262
melon 2020-01-01 0.209928
apple 2021-01-01 0.599078
orange 2021-01-01 0.204032
melon 2021-01-01 0.196889
Another way is processing all columns together:
cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.join(data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') )
.rename(columns=lambda x: x.replace('rate','sum')))
cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.assign(**data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') )
.rename(columns=lambda x: x.replace('rate','sum')))
print (data2)
rate_water rate_fire rate_wood sum_water sum_fire \
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962
orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272
melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234
apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914
orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543
melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629
apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661
orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492
melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847
sum_wood
id date
apple 2019-01-01 0.000000
orange 2019-01-01 0.097122
melon 2019-01-01 0.902878
apple 2020-01-01 0.585809
orange 2020-01-01 0.204262
melon 2020-01-01 0.209928
apple 2021-01-01 0.599078
orange 2021-01-01 0.204032
melon 2021-01-01 0.196889
CodePudding user response:
One option is with the assign method and unpacking:
data2.assign(**data2
.abs()
.div(
data2.groupby('date')
.transform('sum'))
.rename(columns = lambda df: df.removeprefix('rate_'))
.add_prefix('sum_'))
rate_water rate_fire rate_wood sum_water sum_fire sum_wood
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962 0.000000
orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272 0.097122
melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234 0.902878
apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914 0.585809
orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543 0.204262
melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629 0.209928
apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661 0.599078
orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492 0.204032
melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847 0.196889
Another option would be to concatenate along axis=1:
pd.concat([data2,
data2.abs()
.div(groupby('date')
.transform('sum'))
.rename(columns = lambda df: df.removeprefix('rate_'))
.add_prefix('sum_')],
axis = 1)
Out[103]:
rate_water rate_fire rate_wood sum_water sum_fire sum_wood
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962 0.000000
orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272 0.097122
melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234 0.902878
apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914 0.585809
orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543 0.204262
melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629 0.209928
apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661 0.599078
orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492 0.204032
melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847 0.196889
I feel though that it would probably be cleaner to create temporary variables instead, especially if this is going to be in production code