I want to create a function in python that normalizes the values of several variables with specific condition:
As an example the following df, mine have 24 in total (23 int and 1 obj)
Column A | Column B | Column C |
---|---|---|
2 | 4 | A |
3 | 3 | B |
0 | 0.4 | A |
5 | 7 | B |
3 | 2 | A |
6 | 0 | B |
Lets say that I want to create a new df with the values of Col A and Col B after dividing by factor X or Y depending of whether col C is A or B. ie if col C is A the factor is X and if col C is B the factor is Y
I have create different version of a function:
def normalized_new (columns):
for col in df.columns:
if df.loc[df['Column C'] =='A']:
col=df[col]/X
elif df.loc[df['Column C'] =='B']:
col=df[col]/Y
else: pass
return columns
normalized_new (df)
and the other I tried:
def new_norm (prog):
if df.loc[(df['Column C']=='A')]:
prog = 1/X
elif df.loc[(df['Column C']=='B')]:
prog = 1/Y
else: print('this function doesnt work well')
return (prog)
for col in df.columns:
df[col]=new_norm(df)
For both function I always have the same valueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Could you help me to understand what is going on here? is there any other way to create a df with the desire output?
Thank you so much in advance!
CodePudding user response:
Try to use np.where
.div
:
X = 10
Y = -10
df[["Column A", "Column B"]] = df[["Column A", "Column B"]].div(
np.where(df["Column C"].eq("A"), X, Y), axis=0
)
print(df)
Prints:
Column A Column B Column C
0 0.2 0.40 A
1 -0.3 -0.30 B
2 0.0 0.04 A
3 -0.5 -0.70 B
4 0.3 0.20 A
5 -0.6 -0.00 B
CodePudding user response:
Would you consider using apply and call custom function to set new column based on whole row data. This makes it easier to read. For example:
X=10
Y=5
def new_norm(row):
#put your if/elif logic here, for example:
if row['Column C'] == 'A':
return row['Column A']/X #don't forget to return value for new column
....
df['newcol'] = df.apply(new_norm, axis=1) #call function for each row and add column 'newcol'
Function will allow to solve edge case (for example empty Column C or when there is different value than A or B etc.