df
CAR1 CAR2
['ford','hyundai'] ['ford','hyundai']
['ford','hyundai'] ['hyundai','nissan']
['ford','hyundai'] ['bmw', 'audi']
Expected output :
CAR1 CAR2 Flag
['ford','hyundai'] ['ford','hyundai'] 1
['ford','hyundai'] ['hyundai','nissan'] 1
['ford','hyundai'] ['bmw', 'audi'] 0
Raise flag 1 in case of any elements/string from CAR1 matches with CAR2, else raise flag 0
My try is:
df[[x in y for x,y in zip(df['CAR1'], df['CAR2'])]
CodePudding user response:
EDIT: first convert columns to lists:
import ast
cols = ['CAR1','CAR2']
df[cols] = df[cols].apply(ast.literal_eval)
Use set.intersection
in list comprehension with convert to boolean and integers for True,False
to 1/0
mapping:
df['Flag'] = [int(bool(set(x).intersection(y))) for x,y in zip(df['CAR1'], df['CAR2'])]
Alternative solution:
df['Flag'] = [1 if set(x).intersection(y) else 0 for x,y in zip(df['CAR1'], df['CAR2'])]
print (df)
CAR1 CAR2 Flag
0 [ford, hyundai] [ford, hyundai] 1
1 [ford, hyundai] [hyundai, nissan] 1
2 [ford, hyundai] [bmw, audi] 0
CodePudding user response:
You can use set
operations in a list comprehension (isdisjoint
returns False if the sets overlap, which is inverted and converted to integer with 1-x
):
df['Flag'] = [1-set(s1).isdisjoint(s2) for s1, s2 in zip(df['CAR1'], df['CAR2'])]
NB. isdisjoint
is quite fast as it doesn't require to read the full sets, is returns False as soon as a common item is found.
Output:
CAR1 CAR2 Flag
0 [ford, hyundai] [ford, hyundai] 1
1 [ford, hyundai] [hyundai, nissan] 1
2 [ford, hyundai] [bmw, audi] 0
from strings
from ast import literal_eval
df['Flag'] = [1-set(s1).isdisjoint(s2) for s1, s2 in
zip(df['CAR1'].apply(literal_eval),
df['CAR2'].apply(literal_eval))]