I ran into some issues where I used a for-loop and if conditions to update a dataframe. They should be very basic python logic but I couldn't find explanations online so I'd like to ask here.
For illustration purposes, let's look at a simple dataframe df:
1 2
0 1 0
1 0 1
2 1 0
3 0 0
4 1 1
I wanted a third column based on values of the first two columns:
Initially I wrote:
for i in range(len(df)):
if df.loc[i,'1']==1 & df.loc[i,'2']==0:
df.loc[i,'3']=1
else:
df.loc[i,'3']=0
But I got this:
1 2 3
0 1 0 0.0
1 0 1 0.0
2 1 0 0.0
3 0 0 1.0
4 1 1 0.0
Then I found that when I added brackets to my conditions it worked:
So instead of if df.loc[i,'1']==1 & df.loc[i,'2']==0:
I used if (df.loc[i,'1']==1) & (df.loc[i,'2']==0):
So why is this the case?
Besides, I was testing whether I would always need the bracket even when I only have one condition:
for i in range(len(df)):
if df.loc[1,'2']==1:
df.loc[1,'4']=0
else:
df.loc[1,'4']=1
Another problem occurred where I have missing values and only the cell df.loc[1,'4'] was updated:
1 2 3 4
0 1 0 1.0 NaN
1 0 1 0.0 0.0
2 1 0 1.0 NaN
3 0 0 0.0 NaN
4 1 1 0.0 NaN
I'm really baffled and this time adding the bracket doesn't change anything. Why is it like this?
In addition to these two problems, is my method of updating cell values wrong generally speaking?
CodePudding user response:
Vectorized solution is convert chained mask by &
for bitwise AND
to integers for mapping True, False
to 1,0
:
df['3'] = ((df['1'] == 1) & (df['2'] == 0)).astype(int)
Your solution working with scalars, so use and
instead &
working with arrays (not recommended):
for i in range(len(df)):
if df.loc[i,'1']==1 and df.loc[i,'2']==0:
df.loc[i,'3']=1
else:
df.loc[i,'3']=0
print (df)
1 2 3
0 1 0 1.0
1 0 1 0.0
2 1 0 1.0
3 0 0 0.0
4 1 1 0.0
CodePudding user response:
You better use np.where
:
import numpy as np
df['3'] = np.where (df['1']==1 & df['2']==0, 1, 0)
CodePudding user response:
Don't use a loop, this is an anti-pattern in pandas, use:
df['3'] = (df['1'].eq(1) & df['2'].eq(0)).astype(int)
df['4'] = df['2'].ne(1).astype(int)
# or, if only 0/1
# df['4'] = 1 - df['2']
Also, using eq
in place of ==
avoids to need to wrap the equality with parentheses to respect operator precedence.
Output:
1 2 3 4
0 1 0 1 1
1 0 1 0 0
2 1 0 1 1
3 0 0 0 1
4 1 1 0 0
CodePudding user response:
if clomuns 1 equal to 1 and clomun 2 equal to 0 then put in clomuns 3 value 1
df.loc[(df["1"] == 1)&(df["2"] == 0), "3"] = 1
#if clomuns 1 not equal to 1 or clomun 2 not equal to 0 then put in clomuns 3 value 1
df.loc[(df["1"] != 1)|(df["2"] != 0), "3"] = 0