Store the result in new columns, named based on another variable (Pandas)-CodePudding

I have a dataframe. What I need is to calculate the difference between the variables A and B, and store the result in the new columns based on the variable df['Value']. If the Value == 1, then the result is stored in column named Diff_1, if the Value == 2, then column Diff_2, and so on.

Here is the code so far, but obviously the line df_red['Diff_' str(value) ] = df_red['A'] - df_red['B'] is not doing what I want:

import pandas as pd

df = pd.read_excel(r'E:\...\.xlsx')
print(df)

value = list(set(df['Value']))
print(value)

for value in value:
    df_red = df[df['Value'] == value]
    df_red['Diff_'   str(value) ] = df_red['A'] - df_red['B']


Out[126]: 
   ID  Value      A     B
0   1      1   56.0  49.0
1   2      3   56.0  50.0
2   3      4  103.0  44.0
3   4      2   89.0  44.0
4   5      1   84.0  41.0
5   6      1   77.0  43.0
6   7      2   71.0  35.0
7   8      4   77.0  32.0
print(value)
[1, 2, 3, 4]

After a simple operation of df['A'] - df['B'] the result should look like this.

Out[128]: 
   ID  Value      A     B  Diff_1  Diff_2  Diff_3  Diff_4
0   1      1   56.0  49.0     7.0     0.0     0.0     0.0
1   2      3   56.0  50.0     0.0     0.0     6.0     0.0
2   3      4  103.0  44.0     0.0     0.0     0.0    60.0
3   4      2   89.0  44.0     0.0    45.0     0.0     0.0
4   5      1   84.0  41.0    43.0     0.0     0.0     0.0
5   6      1   77.0  43.0    34.0     0.0     0.0     0.0
6   7      2   71.0  35.0     0.0    36.0     0.0     0.0
7   8      4   77.0  32.0     0.0     0.0     0.0    45.0

Not so great way of doing this would be like this, however I am looking for some more efficient, better ways:

df['Diff_1'] = df[df['Value']==1]['A'] - df[df['Value']==1]['B']
df['Diff_2'] = df[df['Value']==2]['A'] - df[df['Value']==2]['B']
df['Diff_3'] = df[df['Value']==3]['A'] - df[df['Value']==3]['B']
df['Diff_4'] = df[df['Value']==4]['A'] - df[df['Value']==4]['B']

CodePudding user response：

You can use:

df.join(df.set_index(['ID', 'Value'])
          .eval('A-B')
          .unstack(level=1).add_prefix('Diff_')
          .reset_index(drop=True)
        )

output:

   ID  Value      A     B  Diff_1  Diff_2  Diff_3  Diff_4
0   1      1   56.0  49.0     7.0     NaN     NaN     NaN
1   2      3   56.0  50.0     NaN     NaN     6.0     NaN
2   3      4  103.0  44.0     NaN     NaN     NaN    59.0
3   4      2   89.0  44.0     NaN    45.0     NaN     NaN
4   5      1   84.0  41.0    43.0     NaN     NaN     NaN
5   6      1   77.0  43.0    34.0     NaN     NaN     NaN
6   7      2   71.0  35.0     NaN    36.0     NaN     NaN
7   8      4   77.0  32.0     NaN     NaN     NaN    45.0

CodePudding user response：

Here is my approach which may not be fastest, but it's a start:

for i in df['Value'].unique():
   df.loc[df['Value'] == i, 'Diff_'   str(i)] = df['A'] - df['B']
   df.fillna(0, inplace = True)

Output of my fake data:

  Value A   B   Diff_1  Diff_2  Diff_3  Diff_4
0   1   20  2   18.0    0.0     0.0     0.0
1   1   30  5   25.0    0.0     0.0     0.0
2   2   40  7   0.0     33.0    0.0     0.0
3   2   50  15  0.0     35.0    0.0     0.0
4   3   60  25  0.0     0.0     35.0    0.0
5   3   20  7   0.0     0.0     13.0    0.0
6   4   15  36  0.0     0.0     0.0     -21.0
7   4   14  3   0.0     0.0     0.0     11.0