Home > Back-end >  Checking if any string element of the column is matching with other column string list in python
Checking if any string element of the column is matching with other column string list in python

Time:12-22

df

     CAR1                        CAR2
['ford','hyundai']         ['ford','hyundai']
['ford','hyundai']         ['hyundai','nissan']
['ford','hyundai']         ['bmw', 'audi']

Expected output :

 CAR1                        CAR2                   Flag
['ford','hyundai']         ['ford','hyundai']        1
['ford','hyundai']         ['hyundai','nissan']      1
['ford','hyundai']         ['bmw', 'audi']           0 

Raise flag 1 in case of any elements/string from CAR1 matches with CAR2, else raise flag 0

My try is:

df[[x in y for x,y in zip(df['CAR1'], df['CAR2'])]

CodePudding user response:

EDIT: first convert columns to lists:

import ast

cols = ['CAR1','CAR2']
df[cols] = df[cols].apply(ast.literal_eval)

Use set.intersection in list comprehension with convert to boolean and integers for True,False to 1/0 mapping:

df['Flag'] = [int(bool(set(x).intersection(y))) for x,y in zip(df['CAR1'], df['CAR2'])]

Alternative solution:

df['Flag'] = [1 if set(x).intersection(y) else 0 for x,y in zip(df['CAR1'], df['CAR2'])]

print (df)
              CAR1               CAR2  Flag
0  [ford, hyundai]    [ford, hyundai]     1
1  [ford, hyundai]  [hyundai, nissan]     1
2  [ford, hyundai]        [bmw, audi]     0

CodePudding user response:

You can use set operations in a list comprehension (isdisjoint returns False if the sets overlap, which is inverted and converted to integer with 1-x):

df['Flag'] = [1-set(s1).isdisjoint(s2) for s1, s2 in zip(df['CAR1'], df['CAR2'])]

NB. isdisjoint is quite fast as it doesn't require to read the full sets, is returns False as soon as a common item is found.

Output:

              CAR1               CAR2  Flag
0  [ford, hyundai]    [ford, hyundai]     1
1  [ford, hyundai]  [hyundai, nissan]     1
2  [ford, hyundai]        [bmw, audi]     0

from strings

from ast import literal_eval

df['Flag'] = [1-set(s1).isdisjoint(s2) for s1, s2 in
               zip(df['CAR1'].apply(literal_eval),
                   df['CAR2'].apply(literal_eval))]
  • Related