Home > Net >  Use nested if condition in dataframe
Use nested if condition in dataframe

Time:11-30

What I want is to write df['good_day'] = 1 if

  • sunshine >= 8
  • rain <= 3
  • temperatur >= 15 and <= 25

and otherwise df['good_day'] = 0

d = {'sunshine': [8, 9, 6, 5],  
     'rain': [2, 4, 0, 8],      
     'temperatur': [22, 32, -12, 4],
     'id': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)

print(df)

   sunshine  rain  temperatur  id
0         8     2          22   1
1         9     4          32   2
2         6     0         -12   3
3         5     8           4   4

df['good_day'] = 0

df.loc[df['sunshine'] >= 8, 'good_day'] = 1
df.loc[df['rain'] <= 3, 'good_day'] = 1
df['good_day'] = df.where(df['temperatur'].between(15, 25), 1)
                  
print(df)

   sunshine  rain  temperatur  id  good_day
0         8     2          22   1         8
1         9     4          32   2         1
2         6     0         -12   3         1
3         5     8           4   4         1

What I want

   sunshine  rain  temperatur  id  good_day
0         8     2          22   1         1
1         9     4          32   2         0
2         6     0         -12   3         0
3         5     8           4   4         0

CodePudding user response:

Just use simple comparisons and logical AND (&) operator to combine the conditions, finally convert the True/False boolean to 1/0:

mask = df['sunshine'].ge(8) & df['rain'].le(3) & df['temperatur'].between(15, 25)
df['good_day'] = mask.astype(int)

output:

   sunshine  rain  temperatur  id  good_day
0         8     2          22   1         1
1         9     4          32   2         0
2         6     0         -12   3         0
3         5     8           4   4         0

CodePudding user response:

Alternatively, you can use:

df['good_day'] = df.apply(lambda x:int(x['sunshine'] >= 8 and x['rain'] <=3 and x['temperatur'] >= 15 and x['temperatur'] <=25), axis=1)

Having said that, I myself prefer to use a solution as proposed above.

I checked the timings:

from timeit import timeit
timeit(lambda:(df['sunshine'].ge(8) & df['rain'].le(3) & df['temperatur'].ge(15) & df['temperatur'].le(25)).astype('int'), number=1_000)
# This returns 0.666049400002521
timeit(lambda:df.apply(lambda x:int(x['sunshine'] >= 8 and x['rain'] <=3 and x['temperatur'] >= 15 and x['temperatur'] <=25), axis=1), number=1_000)
# This returns 0.33816790000128094

CodePudding user response:

Another alternative is to use eval, I find it quite readable, using queries is also pretty handy.

expr = "sunshine == 8 & rain <= 3 & 15 <= temperatur <= 25"
mask = df.eval(expr)

df.loc[mask]      # Gets the rows with the relevant conditions
df.loc[~mask]     # Gets the rows which do not conform to the conditions

The output is then :

df.loc[mask]

   sunshine  rain  temperatur  id
0         8     2          22   1

and for the negation

df.loc[~mask]

   sunshine  rain  temperatur  id
1         9     4          32   2
2         6     0         -12   3
3         5     8           4   4

CodePudding user response:

Also you can use this:

df.loc[(df['temperatur'] <= 25) & (df['temperatur'] >= 15) & (df['rain'] <= 3) & ( df['sunshine'] >= 8), 'good_day'] = 1
  • Related