My dataframe is like this:
df = pd.DataFrame({'A': [1,2,3], 'B': [1,4,5]})
If column A has the same value as column B, output 1, else 0.
I want to output like this:
A B is_equal
0 1 1 1
1 2 4 0
2 3 5 0
I figured out df['is_equal'] = np.where((df['A'] == df['B']), 1, 0)
worked fine.
But I want to use lambda here because I used a similar line in another case before. df['is_equals'] = df.apply(lambda x: 1 if df['A']==1 else 0, axis=1)
won't work. It threw the error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Why did this error happen and how can I fix the code.
Thank you in advance.
CodePudding user response:
What you attempt to do is very inefficient. Don not do it. .apply
should not be used when other solutions are possible. The best solution is:
df['is_equal'] = (df['A'] == df['B']).astype(int)
But if you insist:
df.apply(lambda row: int(row['A'] == row['B']), axis=1)
The latter answer is 2,5 times slower. The original np.where
is the fastest.
CodePudding user response:
I also agree with DYZ's opinion. But if you want to use .apply
anyway, I can suggest something like this.
df['is_equal'] = df.apply(lambda x: 1 if x.A == x.B else 0, axis=1)