Following the title of the question, the case is this:
Creating dataframe:
import pandas as pd
df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
'b': [45, 34, 556, 32, 97, 33],
'c': [234, 66, 12, 44, 99, 3],
'd': [123, 45, 55, 98, 17, 22] })
df
Output:
a b c d
0 one 45 234 123
1 one 34 66 45
2 three 556 12 55
3 two 32 44 98
4 eleven97 99 17
5 two 33 3 22
Let's say I want to add a column 'e' which is the sum of the columns 'b', 'c' and 'd'. It's simple:
df['e'] = df.b df.c df.d
df
Output:
a b c d e
0 one 45 234 123 402
1 one 34 66 45 145
2 three 556 12 55 623
3 two 32 44 98 174
4 eleven 97 99 17 213
5 two 33 3 22 58
Now I want one more column 'f' , but based on the following condition:
if df.a == 'one' and df.b < 50:
df['f'] = 0
elif df.a == 'two' and df.d > 50:
df['f'] = 1
else:
df['f'] = 2
But of course this code does not work.
out:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(),
a.item(), a.any() or a.all().
How could those condition be correctly implemented?
CodePudding user response:
You can use np.select for this:
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
'b': [45, 34, 556, 32, 97, 33],
'c': [234, 66, 12, 44, 99, 3],
'd': [123, 45, 55, 98, 17, 22] })
df['e'] = df.b df.c df.d
# list with your conditions
conditions = [(df.a == 'one') & (df.b < 50),
(df.a == 'two') & (df.d > 50)]
# list with accompanying choices
choices = [0,1]
df['f'] = np.select(conditions, choices, 2)
# 2 being the default: i.e. the 'else' choice.
df
a b c d e f
0 one 45 234 123 402 0
1 one 34 66 45 145 0
2 three 556 12 55 623 2
3 two 32 44 98 174 1
4 eleven 97 99 17 213 2
5 two 33 3 22 58 2
CodePudding user response:
You can use nested np.where
methods:
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
'b': [45, 34, 556, 32, 97, 33],
'c': [234, 66, 12, 44, 99, 3],
'd': [123, 45, 55, 98, 17, 22] })
df['e'] = df.b df.c df.d
df['f'] = np.where(
(df.a == 'one') & (df.b < 50),
0,
np.where(
(df.a == 'two') & (df.d > 50),
1,
2
)
)
Output:
a b c d e f
0 one 45 234 123 402 0
1 one 34 66 45 145 0
2 three 556 12 55 623 2
3 two 32 44 98 174 1
4 eleven 97 99 17 213 2
5 two 33 3 22 58 2
CodePudding user response:
def setter(x):
if x.a == 'one' and x.b < 50:
return 0
elif x.a == 'two' and x.d > 50:
return 1
else:
return 2
df['f'] = df.apply(lambda x: setter(x), axis=1)
CodePudding user response:
One option is case_when from pyjanitor; it is a wrapper around pd.Series.mask
and as much as possible passes all the hardwork of dtypes and the like to Pandas:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.assign(e = df.b df.c df.d)
# case_when is an alternation
# of conditions and expected values
.case_when(df.a.eq('one') & df.b.lt(50), 0, # condition, value
df.a.eq('two') & df.d.gt(50), 1,
2, # default
column_name = 'f')
)
a b c d e f
0 one 45 234 123 402 0
1 one 34 66 45 145 0
2 three 556 12 55 623 2
3 two 32 44 98 174 1
4 eleven 97 99 17 213 2
5 two 33 3 22 58 2