Home > Software engineering >  Iterating over 2 columns and comparing similarities in Python
Iterating over 2 columns and comparing similarities in Python

Time:01-13

I have a DF that looks like this:

Row      Account_Name_HGI           company_name_Ignite
1        00150042 plc               WAGON PLC
2        01 telecom, ltd.           01 TELECOM LTD
3        0404 investments limited   0404 Investments Ltd

what I am trying to do is to iterate through the Account_Name_HGI and the company_name_Ignite columns and compare the 2 strings in row 1 and provide me with a similarity score. I have got the code that provides the score:

from difflib import SequenceMatcher

def similar(a, b):
     return SequenceMatcher(None, a, b).ratio()

And that brings the similarity score that I want but I am having an issue with the logic on how to create a for loop that will iterate over the 2 columns and return the similarity score. Any help will be appreciated.

CodePudding user response:

Use list comprehension with zipping both columns:

from difflib import SequenceMatcher

df['ratio'] = [SequenceMatcher(None, a, b).ratio()
               for a, b 
               in zip(df['Account_Name_HGI'], df['company_name_Ignite'])]

print (df)
   Row          Account_Name_HGI   company_name_Ignite     ratio
0    1              00150042 plc             WAGON PLC  0.095238
1    2          01 telecom, ltd.        01 TELECOM LTD  0.266667
2    3  0404 investments limited  0404 Investments Ltd  0.818182

CodePudding user response:

Use a list comprehension with zip:

from difflib import SequenceMatcher

df['ratio'] = [similar(a, b) for a, b in
               zip(df['Account_Name_HGI'], df['company_name_Ignite'])]

# or directly without your custom function
df['ratio'] = [SequenceMatcher(None, a, b).ratio() for a,b in
               zip(df['Account_Name_HGI'], df['company_name_Ignite'])
               ]

Output:

   Row          Account_Name_HGI   company_name_Ignite     ratio
0    1              00150042 plc             WAGON PLC  0.095238
1    2          01 telecom, ltd.        01 TELECOM LTD  0.266667
2    3  0404 investments limited  0404 Investments Ltd  0.818182
  • Related