I have a dataframe. What I need is to calculate the difference between the variables A
and B
, and store the result in the new columns based on the variable df['Value']
. If the Value == 1
, then the result is stored in column named Diff_1
, if the Value == 2
, then column Diff_2
, and so on.
Here is the code so far, but obviously the line df_red['Diff_' str(value) ] = df_red['A'] - df_red['B']
is not doing what I want:
import pandas as pd
df = pd.read_excel(r'E:\...\.xlsx')
print(df)
value = list(set(df['Value']))
print(value)
for value in value:
df_red = df[df['Value'] == value]
df_red['Diff_' str(value) ] = df_red['A'] - df_red['B']
Out[126]:
ID Value A B
0 1 1 56.0 49.0
1 2 3 56.0 50.0
2 3 4 103.0 44.0
3 4 2 89.0 44.0
4 5 1 84.0 41.0
5 6 1 77.0 43.0
6 7 2 71.0 35.0
7 8 4 77.0 32.0
print(value)
[1, 2, 3, 4]
After a simple operation of df['A'] - df['B']
the result should look like this.
Out[128]:
ID Value A B Diff_1 Diff_2 Diff_3 Diff_4
0 1 1 56.0 49.0 7.0 0.0 0.0 0.0
1 2 3 56.0 50.0 0.0 0.0 6.0 0.0
2 3 4 103.0 44.0 0.0 0.0 0.0 60.0
3 4 2 89.0 44.0 0.0 45.0 0.0 0.0
4 5 1 84.0 41.0 43.0 0.0 0.0 0.0
5 6 1 77.0 43.0 34.0 0.0 0.0 0.0
6 7 2 71.0 35.0 0.0 36.0 0.0 0.0
7 8 4 77.0 32.0 0.0 0.0 0.0 45.0
Not so great way of doing this would be like this, however I am looking for some more efficient, better ways:
df['Diff_1'] = df[df['Value']==1]['A'] - df[df['Value']==1]['B']
df['Diff_2'] = df[df['Value']==2]['A'] - df[df['Value']==2]['B']
df['Diff_3'] = df[df['Value']==3]['A'] - df[df['Value']==3]['B']
df['Diff_4'] = df[df['Value']==4]['A'] - df[df['Value']==4]['B']
CodePudding user response:
You can use:
df.join(df.set_index(['ID', 'Value'])
.eval('A-B')
.unstack(level=1).add_prefix('Diff_')
.reset_index(drop=True)
)
output:
ID Value A B Diff_1 Diff_2 Diff_3 Diff_4
0 1 1 56.0 49.0 7.0 NaN NaN NaN
1 2 3 56.0 50.0 NaN NaN 6.0 NaN
2 3 4 103.0 44.0 NaN NaN NaN 59.0
3 4 2 89.0 44.0 NaN 45.0 NaN NaN
4 5 1 84.0 41.0 43.0 NaN NaN NaN
5 6 1 77.0 43.0 34.0 NaN NaN NaN
6 7 2 71.0 35.0 NaN 36.0 NaN NaN
7 8 4 77.0 32.0 NaN NaN NaN 45.0
CodePudding user response:
Here is my approach which may not be fastest, but it's a start:
for i in df['Value'].unique():
df.loc[df['Value'] == i, 'Diff_' str(i)] = df['A'] - df['B']
df.fillna(0, inplace = True)
Output of my fake data:
Value A B Diff_1 Diff_2 Diff_3 Diff_4
0 1 20 2 18.0 0.0 0.0 0.0
1 1 30 5 25.0 0.0 0.0 0.0
2 2 40 7 0.0 33.0 0.0 0.0
3 2 50 15 0.0 35.0 0.0 0.0
4 3 60 25 0.0 0.0 35.0 0.0
5 3 20 7 0.0 0.0 13.0 0.0
6 4 15 36 0.0 0.0 0.0 -21.0
7 4 14 3 0.0 0.0 0.0 11.0