My aim is to put values in column D based on columns A to C. I want to go through each of the columns A, B and C and add 1 to column D if the value is greater than 20
import pandas as pd
data={'A':[5,2,25,4],"B":[15,22,100,24], "C":[4, 100, 0, 19], "D" : [0,0,0,0]}
df= pd.DataFrame(data)
for x in range(0, len(df.columns)):
if df.iloc[:, x] > 20:
df["D"] = df["D"] 1
else:
df["D"] = df["D"]
df
So, based on the sample code, Column D should look like
D |
---|
0 |
2 |
2 |
1 |
However, i keep getting this error - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I have gone through other answers here (Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()) which indicate that it is a multiple operator issue but I am not using these here.
what I am doing wrong?
CodePudding user response:
What is wrong is that you can't use if Series > value
in a vectorial way. if
expects a single boolean value and df.iloc[:, x] > 20
returns a Series of booleans.
In your case use:
df['D'] = df.drop(columns='D').gt(20).sum(1)
NB. If column D is initially null or inexistent, skip the .drop(columns='D')
.
Output:
A B C D
0 5 15 4 0
1 2 22 100 2
2 25 100 0 2
3 4 24 19 1
CodePudding user response:
Another possible solution:
df.D = (
df.loc[:, ~ df.columns.str.contains('D')]
.apply(lambda x: sum(x > 20), axis = 1))
Output:
A B C D
0 5 15 4 0
1 2 22 100 2
2 25 100 0 2
3 4 24 19 1