What I want is to write df['good_day'] = 1
if
sunshine >= 8
rain <= 3
temperatur >= 15 and <= 25
and otherwise df['good_day'] = 0
d = {'sunshine': [8, 9, 6, 5],
'rain': [2, 4, 0, 8],
'temperatur': [22, 32, -12, 4],
'id': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)
print(df)
sunshine rain temperatur id
0 8 2 22 1
1 9 4 32 2
2 6 0 -12 3
3 5 8 4 4
df['good_day'] = 0
df.loc[df['sunshine'] >= 8, 'good_day'] = 1
df.loc[df['rain'] <= 3, 'good_day'] = 1
df['good_day'] = df.where(df['temperatur'].between(15, 25), 1)
print(df)
sunshine rain temperatur id good_day
0 8 2 22 1 8
1 9 4 32 2 1
2 6 0 -12 3 1
3 5 8 4 4 1
What I want
sunshine rain temperatur id good_day
0 8 2 22 1 1
1 9 4 32 2 0
2 6 0 -12 3 0
3 5 8 4 4 0
CodePudding user response:
Just use simple comparisons and logical AND (&
) operator to combine the conditions, finally convert the True/False boolean to 1/0:
mask = df['sunshine'].ge(8) & df['rain'].le(3) & df['temperatur'].between(15, 25)
df['good_day'] = mask.astype(int)
output:
sunshine rain temperatur id good_day
0 8 2 22 1 1
1 9 4 32 2 0
2 6 0 -12 3 0
3 5 8 4 4 0
CodePudding user response:
Alternatively, you can use:
df['good_day'] = df.apply(lambda x:int(x['sunshine'] >= 8 and x['rain'] <=3 and x['temperatur'] >= 15 and x['temperatur'] <=25), axis=1)
Having said that, I myself prefer to use a solution as proposed above.
I checked the timings:
from timeit import timeit
timeit(lambda:(df['sunshine'].ge(8) & df['rain'].le(3) & df['temperatur'].ge(15) & df['temperatur'].le(25)).astype('int'), number=1_000)
# This returns 0.666049400002521
timeit(lambda:df.apply(lambda x:int(x['sunshine'] >= 8 and x['rain'] <=3 and x['temperatur'] >= 15 and x['temperatur'] <=25), axis=1), number=1_000)
# This returns 0.33816790000128094
CodePudding user response:
Another alternative is to use eval, I find it quite readable, using queries is also pretty handy.
expr = "sunshine == 8 & rain <= 3 & 15 <= temperatur <= 25"
mask = df.eval(expr)
df.loc[mask] # Gets the rows with the relevant conditions
df.loc[~mask] # Gets the rows which do not conform to the conditions
The output is then :
df.loc[mask]
sunshine rain temperatur id
0 8 2 22 1
and for the negation
df.loc[~mask]
sunshine rain temperatur id
1 9 4 32 2
2 6 0 -12 3
3 5 8 4 4
CodePudding user response:
Also you can use this:
df.loc[(df['temperatur'] <= 25) & (df['temperatur'] >= 15) & (df['rain'] <= 3) & ( df['sunshine'] >= 8), 'good_day'] = 1