I have a Dataframe that look like this:
df_1:
Phase_1 Phase_2 Phase_3
0 8 4 2
1 4 6 3
2 8 8 3
3 10 5 8
...
I'd like to add a column called "Coeff" that compute (Phase_max - Phase_min) / Phase_max
For the first row: Coeff= (Phase_1 - Phase_3)/ Phase_1 = (8-2)/8 = 0.75
Expected OUTPUT:
df_1
Phase_1 Phase_2 Phase_3 Coeff
0 8 4 2 0.75
1 4 6 3 0.5
2 8 8 3 0.625
3 10 5 8 0.5
What is the best way to compute this without using loop? I want to apply it on large dataset.
CodePudding user response:
here is one way to do it
# list the columns, you like to use in calculations
cols=['Phase_1', 'Phase_2', 'Phase_3']
# using max and min across the axis to calculate, for the defined columns
df['coeff']=(df[cols].max(axis=1).sub(df[cols].min(axis=1))).div(df[cols].max(axis=1))
df
little performance optimization (credit Yevhen Kuzmovych)
df['coeff']= 1 - (df[cols].min(axis=1).div(df[cols].max(axis=1)))
df
Phase_1 Phase_2 Phase_3 coeff
0 8 4 2 0.750
1 4 6 3 0.500
2 8 8 3 0.625
3 10 5 8 0.500
CodePudding user response:
As per OP specification
I only want the max or the min between Phase_1 Phase_2 and Phase_3 and not other columns
The following will do the work
columns = ['Phase_1', 'Phase_2', 'Phase_3']
max_phase = df[columns].max(axis = 1)
min_phase = df[columns].min(axis = 1)
df['Coeff'] = (max_phase - min_phase) / max_phase
# or
max_phase = df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1)
min_phase = df[['Phase_1', 'Phase_2', 'Phase_3']].min(axis = 1)
df['Coeff'] = (max_phase - min_phase) / max_phase
# or
df['Coeff'] = (df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1) - df[['Phase_1', 'Phase_2', 'Phase_3']].min(axis = 1)) / df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1)
[Out]:
Phase_1 Phase_2 Phase_3 Coeff
0 8 4 2 0.750
1 4 6 3 0.500
2 8 8 3 0.625
3 10 5 8 0.500
Another alternative would be to use numpy built-in modules, as follows
columns = ['Phase_1', 'Phase_2', 'Phase_3']
max_phase = np.max(df[columns], axis = 1)
min_phase = np.min(df[columns], axis = 1)
df['Coeff'] = (max_phase - min_phase) / max_phase
# or
max_phase = np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)
min_phase = np.min(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)
df['Coeff'] = (max_phase - min_phase) / max_phase
# or
df['Coeff'] = (np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1) - np.min(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)) / np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)
[Out]:
Phase_1 Phase_2 Phase_3 Coeff
0 8 4 2 0.750
1 4 6 3 0.500
2 8 8 3 0.625
3 10 5 8 0.500