I have a DF that looks like this:
Row Account_Name_HGI company_name_Ignite
1 00150042 plc WAGON PLC
2 01 telecom, ltd. 01 TELECOM LTD
3 0404 investments limited 0404 Investments Ltd
what I am trying to do is to iterate through the Account_Name_HGI
and the company_name_Ignite
columns and compare the 2 strings in row 1 and provide me with a similarity score. I have got the code that provides the score:
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
And that brings the similarity score that I want but I am having an issue with the logic on how to create a for loop that will iterate over the 2 columns and return the similarity score. Any help will be appreciated.
CodePudding user response:
Use list comprehension with zipping both columns:
from difflib import SequenceMatcher
df['ratio'] = [SequenceMatcher(None, a, b).ratio()
for a, b
in zip(df['Account_Name_HGI'], df['company_name_Ignite'])]
print (df)
Row Account_Name_HGI company_name_Ignite ratio
0 1 00150042 plc WAGON PLC 0.095238
1 2 01 telecom, ltd. 01 TELECOM LTD 0.266667
2 3 0404 investments limited 0404 Investments Ltd 0.818182
CodePudding user response:
Use a list comprehension with zip
:
from difflib import SequenceMatcher
df['ratio'] = [similar(a, b) for a, b in
zip(df['Account_Name_HGI'], df['company_name_Ignite'])]
# or directly without your custom function
df['ratio'] = [SequenceMatcher(None, a, b).ratio() for a,b in
zip(df['Account_Name_HGI'], df['company_name_Ignite'])
]
Output:
Row Account_Name_HGI company_name_Ignite ratio
0 1 00150042 plc WAGON PLC 0.095238
1 2 01 telecom, ltd. 01 TELECOM LTD 0.266667
2 3 0404 investments limited 0404 Investments Ltd 0.818182