I have DataFrame as given:
TXN_DATE_TIME TX_ID CUST_ID STATE_1 STATE_2 STATE_3
01-06-2020 00:00 1 123 Maharashtra Maharashtra Maharashtra
01-06-2020 00:00 2 345 Pune Chennai Gujarat
01-06-2020 00:00 3 222 Chennai Gujarat Chennai
01-06-2020 00:00 4 1356 Gujarat Chennai Delhi
01-06-2020 00:00 5 2345 Punjab Punjab Delhi
01-06-2020 00:00 6 1111 Haryana Delhi Punjab
01-06-2020 00:00 7 5678 Delhi Maharashtra Haryana
01-06-2020 00:00 8 9999 Kerela Assam Assam
01-06-2020 00:00 9 2345 Assam Assam Assam
01-06-2020 00:00 10 6666 Tripura Tripura Tripura
01-06-2020 00:00 11 7896 Kolkatta Kolkatta Kolkatta
I want to create a new column match in the df containing two values Match and No match based on the following conditions:
If State_1==State_2==STATE_3 Then Match=1
Else Match=0
Hence the expected df
will be:
I tried to use np.where
in pandas df
by using:
df['MATCH']=np.where(df['STATE_1']==df['STATE_2']==df['STATE_3'],1,0)
But it gives me the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I wanted to know that is there any other faster method other than np.where that can be used to achieve the expected result and if not then how can I avoid this error? ?
CodePudding user response:
Use:
df['MATCH']=(df['STATE_1']==df['STATE_2'])&(df['STATE_2']==df['STATE_3'])
df['MATCH'] = df['MATCH'].astype(int)
CodePudding user response:
We could also use eq
all
:
df['MATCH'] = df[['STATE_2', 'STATE_3']].eq(df['STATE_1'], axis=0).all(axis=1).astype(int)
If you have more than 3 "STATE"s, we could use filter
:
df['MATCH'] = df.filter(like='STATE').eq(df['STATE_1'], axis=0).all(axis=1)
Output:
TXN_DATE_TIME TX_ID CUST_ID STATE_1 STATE_2 STATE_3 MATCH
0 01-06-2020 00:00 1 123 Maharashtra Maharashtra Maharashtra 1
1 01-06-2020 00:00 2 345 Pune Chennai Gujarat 0
2 01-06-2020 00:00 3 222 Chennai Gujarat Chennai 0
3 01-06-2020 00:00 4 1356 Gujarat Chennai Delhi 0
4 01-06-2020 00:00 5 2345 Punjab Punjab Delhi 0
5 01-06-2020 00:00 6 1111 Haryana Delhi Punjab 0
6 01-06-2020 00:00 7 5678 Delhi Maharashtra Haryana 0
7 01-06-2020 00:00 8 9999 Kerela Assam Assam 0
8 01-06-2020 00:00 9 2345 Assam Assam Assam 1
9 01-06-2020 00:00 10 6666 Tripura Tripura Tripura 1
10 01-06-2020 00:00 11 7896 Kolkatta Kolkatta Kolkatta 1