Checking if any string element of the column is matching with other column string list in python-CodePudding

     CAR1                        CAR2
['ford','hyundai']         ['ford','hyundai']
['ford','hyundai']         ['hyundai','nissan']
['ford','hyundai']         ['bmw', 'audi']

Expected output :

 CAR1                        CAR2                   Flag
['ford','hyundai']         ['ford','hyundai']        1
['ford','hyundai']         ['hyundai','nissan']      1
['ford','hyundai']         ['bmw', 'audi']           0

Raise flag 1 in case of any elements/string from CAR1 matches with CAR2, else raise flag 0

My try is:

df[[x in y for x,y in zip(df['CAR1'], df['CAR2'])]

CodePudding user response：

EDIT: first convert columns to lists:

import ast

cols = ['CAR1','CAR2']
df[cols] = df[cols].apply(ast.literal_eval)

Use set.intersection in list comprehension with convert to boolean and integers for True,False to 1/0 mapping:

df['Flag'] = [int(bool(set(x).intersection(y))) for x,y in zip(df['CAR1'], df['CAR2'])]

Alternative solution:

df['Flag'] = [1 if set(x).intersection(y) else 0 for x,y in zip(df['CAR1'], df['CAR2'])]

print (df)
              CAR1               CAR2  Flag
0  [ford, hyundai]    [ford, hyundai]     1
1  [ford, hyundai]  [hyundai, nissan]     1
2  [ford, hyundai]        [bmw, audi]     0

CodePudding user response：

You can use set operations in a list comprehension (isdisjoint returns False if the sets overlap, which is inverted and converted to integer with 1-x):

df['Flag'] = [1-set(s1).isdisjoint(s2) for s1, s2 in zip(df['CAR1'], df['CAR2'])]

NB. isdisjoint is quite fast as it doesn't require to read the full sets, is returns False as soon as a common item is found.

Output:

              CAR1               CAR2  Flag
0  [ford, hyundai]    [ford, hyundai]     1
1  [ford, hyundai]  [hyundai, nissan]     1
2  [ford, hyundai]        [bmw, audi]     0

from strings

from ast import literal_eval

df['Flag'] = [1-set(s1).isdisjoint(s2) for s1, s2 in
               zip(df['CAR1'].apply(literal_eval),
                   df['CAR2'].apply(literal_eval))]