Home > database >  Adding values to a columns based on other columns in the same dataframe
Adding values to a columns based on other columns in the same dataframe

Time:08-13

My aim is to put values in column D based on columns A to C. I want to go through each of the columns A, B and C and add 1 to column D if the value is greater than 20

import pandas as pd
data={'A':[5,2,25,4],"B":[15,22,100,24], "C":[4, 100, 0, 19], "D" : [0,0,0,0]}
df= pd.DataFrame(data)

for x in range(0, len(df.columns)):
    if df.iloc[:, x]  > 20:
        df["D"] = df["D"]  1
    else:
        df["D"] = df["D"]
df

So, based on the sample code, Column D should look like

D
0
2
2
1

However, i keep getting this error - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I have gone through other answers here (Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()) which indicate that it is a multiple operator issue but I am not using these here.

what I am doing wrong?

CodePudding user response:

What is wrong is that you can't use if Series > value in a vectorial way. if expects a single boolean value and df.iloc[:, x] > 20 returns a Series of booleans.

In your case use:

df['D'] = df.drop(columns='D').gt(20).sum(1)

NB. If column D is initially null or inexistent, skip the .drop(columns='D').

Output:

    A    B    C  D
0   5   15    4  0
1   2   22  100  2
2  25  100    0  2
3   4   24   19  1

CodePudding user response:

Another possible solution:

df.D = (
  df.loc[:, ~ df.columns.str.contains('D')]
  .apply(lambda x: sum(x > 20), axis = 1))

Output:

    A    B    C  D
0   5   15    4  0
1   2   22  100  2
2  25  100    0  2
3   4   24   19  1
  • Related