Home > Back-end >  operation between columns according to the value it contains
operation between columns according to the value it contains

Time:10-28

I have a Dataframe that look like this:

df_1:

    Phase_1  Phase_2   Phase_3
0      8       4        2
1      4       6        3
2      8       8        3
3      10      5        8
...

I'd like to add a column called "Coeff" that compute (Phase_max - Phase_min) / Phase_max

For the first row: Coeff= (Phase_1 - Phase_3)/ Phase_1 = (8-2)/8 = 0.75

Expected OUTPUT:

df_1

    Phase_1  Phase_2   Phase_3   Coeff
0      8       4        2       0.75
1      4       6        3       0.5
2      8       8        3       0.625
3      10      5        8       0.5

What is the best way to compute this without using loop? I want to apply it on large dataset.

CodePudding user response:

here is one way to do it

# list the columns, you like to use in calculations
cols=['Phase_1', 'Phase_2', 'Phase_3']

# using max and min across the axis to calculate, for the defined columns

df['coeff']=(df[cols].max(axis=1).sub(df[cols].min(axis=1))).div(df[cols].max(axis=1))
df

little performance optimization (credit Yevhen Kuzmovych)

df['coeff']= 1 - (df[cols].min(axis=1).div(df[cols].max(axis=1)))

df
    Phase_1     Phase_2     Phase_3     coeff
0         8           4           2     0.750
1         4           6           3     0.500
2         8           8           3     0.625
3         10          5           8     0.500

CodePudding user response:

As per OP specification

I only want the max or the min between Phase_1 Phase_2 and Phase_3 and not other columns

The following will do the work

columns = ['Phase_1', 'Phase_2', 'Phase_3']
max_phase = df[columns].max(axis = 1)
min_phase = df[columns].min(axis = 1)
df['Coeff'] = (max_phase - min_phase) / max_phase

# or

max_phase = df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1)
min_phase = df[['Phase_1', 'Phase_2', 'Phase_3']].min(axis = 1)
df['Coeff'] = (max_phase - min_phase) / max_phase

# or

df['Coeff'] = (df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1) - df[['Phase_1', 'Phase_2', 'Phase_3']].min(axis = 1)) / df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1)

[Out]:

   Phase_1  Phase_2  Phase_3  Coeff
0        8        4        2  0.750
1        4        6        3  0.500
2        8        8        3  0.625
3       10        5        8  0.500

Another alternative would be to use numpy built-in modules, as follows

columns = ['Phase_1', 'Phase_2', 'Phase_3']
max_phase = np.max(df[columns], axis = 1)
min_phase = np.min(df[columns], axis = 1)
df['Coeff'] = (max_phase - min_phase) / max_phase

# or

max_phase = np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)
min_phase = np.min(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)
df['Coeff'] = (max_phase - min_phase) / max_phase

# or

df['Coeff'] = (np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1) - np.min(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)) / np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)

[Out]:

   Phase_1  Phase_2  Phase_3  Coeff
0        8        4        2  0.750
1        4        6        3  0.500
2        8        8        3  0.625
3       10        5        8  0.500
  • Related