Function with for loop and logical operators-CodePudding

I want to create a function in python that normalizes the values of several variables with specific condition:

As an example the following df, mine have 24 in total (23 int and 1 obj)

Column A	Column B	Column C
2	4	A
3	3	B
0	0.4	A
5	7	B
3	2	A
6	0	B

Lets say that I want to create a new df with the values of Col A and Col B after dividing by factor X or Y depending of whether col C is A or B. ie if col C is A the factor is X and if col C is B the factor is Y

I have create different version of a function:

def normalized_new (columns): for col in df.columns: if df.loc[df['Column C'] =='A']: col=df[col]/X elif df.loc[df['Column C'] =='B']: col=df[col]/Y
else: pass return columns

normalized_new (df)

and the other I tried:

def new_norm (prog): if df.loc[(df['Column C']=='A')]: prog = 1/X elif df.loc[(df['Column C']=='B')]: prog = 1/Y else: print('this function doesnt work well') return (prog) for col in df.columns: df[col]=new_norm(df)

For both function I always have the same valueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Could you help me to understand what is going on here? is there any other way to create a df with the desire output?

Thank you so much in advance!

CodePudding user response：

Try to use np.where .div:

X = 10
Y = -10

df[["Column A", "Column B"]] = df[["Column A", "Column B"]].div(
    np.where(df["Column C"].eq("A"), X, Y), axis=0
)
print(df)

Prints:

   Column A  Column B Column C
0       0.2      0.40        A
1      -0.3     -0.30        B
2       0.0      0.04        A
3      -0.5     -0.70        B
4       0.3      0.20        A
5      -0.6     -0.00        B

CodePudding user response：

Would you consider using apply and call custom function to set new column based on whole row data. This makes it easier to read. For example:

X=10
Y=5

def new_norm(row):
    #put your if/elif logic here, for example:
    if row['Column C'] == 'A':
         return row['Column A']/X #don't forget to return value for new column
    ....

df['newcol'] = df.apply(new_norm, axis=1) #call function for each row and add column 'newcol'

Function will allow to solve edge case (for example empty Column C or when there is different value than A or B etc.