Reconcile Pandas Dataframe based on same wording Found-CodePudding

I have below dataframe,

df = pd.DataFrame([['Coca-Cola','Coca-Cola Ltd Co'], ['BMW','Company BMW Ltd'], ['Nike','Adidas Ltd and Co.']], columns=['Brand','Company Name'])

where I need to reconcile the "status" column based on same wording found in 'brand' and 'company name' columns:

Brand             Company Name

Coca-Cola         Coca-Cola Ltd Co
BMW               Company BMW Ltd
Nike              Adidas Ltd and Co.

Ideally I would like my 'status' column to return 'same' to same brand and company name and 'not' for different ones as below:

    Brand             Company Name     Status

Coca-Cola         Coca-Cola            same
BMW               Company BMW Ltd      same
Nike              Nike Ltd and Co.     not

The current method I have is working but not for all as some company names come in between or differently (with full company name)

my_list=[]
for brands,names in zip(df.Brand,df["Company Name"]):
    if brands==names:
       my_list.append('same')
    else:
       my_list.append('not')

Please share your suggestions on how I can reconcile it based on same wording found in both columns. Thanks.

CodePudding user response：

If brand names aren't multiple words, we can simply split Company Name on white space and look up membership:

df['Status'] = df.apply(lambda x: 'same' if x['Brand'] in x['Company Name'].split() else 'not', axis=1)

Output:

       Brand        Company Name Status
0  Coca-Cola    Coca-Cola Ltd Co   same
1        BMW     Company BMW Ltd   same
2       Nike  Adidas Ltd and Co.    not

CodePudding user response：

You can use List comprehension for better timings. zip the columns into one and check whether Brand exists in Company Name or not:

In [251]: df['Status'] = ['same' if x in y else 'not' for x,y in zip(df['Brand'], df['Company Name'].str.split())]

In [252]: df
Out[252]: 
       Brand        Company Name Status
0  Coca-Cola    Coca-Cola Ltd Co   same
1        BMW     Company BMW Ltd   same
2       Nike  Adidas Ltd and Co.    not

Timings:

In [261]: def f1():
     ...:     df['Status'] = df.apply(lambda x: 'same' if x['Brand'] in x['Company Name'].split() else 'not', axis=1)
     ...: 

In [259]: def f2(): 
     ...:     df['Status'] = ['same' if x in y else 'not' for x,y in zip(df['Brand'], df['Company Name'].str.split())]

In [262]: %timeit f1() # @Manlai's solution
468 µs ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [263]: %timeit f2() # my solution
360 µs ± 6.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)