suppose I have following dataframe :
data = {"age":[2,3,2,5,9,12,20,43,55,60],'alpha' : [0,0,0,0,0,0,0,0,0,0]}
df = pd.DataFrame(data)
I want to change value of column alpha
based on column age
using df.loc
and an arithmetic sequences but I got syntax error:
df.loc[((df.age <=4)) , "alpha"] = ".4"
df.loc[((df.age >= 5)) & ((df.age <= 20)), "alpha"] = 0.4 (1 - 0.4)*((df$age - 4)/(20 - 4))
df.loc[((df.age > 20)) , "alpha"] = "1"
thank you in davance.
CodePudding user response:
Reference the age
column using a .
not a $
df.loc[((df.age >= 5)) & ((df.age <= 20)), "alpha"] = 0.4 (1 - 0.4)*((df.age - 4)/(20 - 4))
CodePudding user response:
Instead of multiple .loc
assignments you can combine all conditions at once using chained np.where
clauses:
df['alpha'] = np.where(df.age <= 4, ".4", np.where((df.age >= 5) & (df.age <= 20),
0.4 (1 - 0.4) *((df.age - 4)/(20 - 4)),
np.where(df.age > 20, "1", df.alpha)))
print(df)
age alpha
0 2 .4
1 3 .4
2 2 .4
3 5 0.4375
4 9 0.5875
5 12 0.7
6 20 1.0
7 43 1
8 55 1
9 60 1
CodePudding user response:
Besides the synthax error
(due to $), to reduce visible noise, I would go for numpy.select
:
import numpy as np
conditions = [df["age"].le(4),
df["age"].gt(4) & df["age"].le(20),
df["age"].gt(20)]
values = [".4", 0.4 (1 - 0.4) * ((df["age"] - 4) / (20 - 4)), 1]
df["alpha"] = np.select(condlist= conditions, choicelist= values)
Output :
print(df)
age alpha
0 2 .4
1 3 .4
2 2 .4
3 5 0.4375
4 9 0.5875
5 12 0.7
6 20 1.0
7 43 1
8 55 1
9 60 1