I have 2 Excel files which contains names as the only column:
File 1: file1.xlsx
Names
Vinay adkz
Sagarbhansali
Jeffery Jonas
Kiara Francis
Dominic
File 2: file2.xlsx
Names:
bhansali Sagar
Dominic
Jenny
adkzVinay
Sample Output:
I want to match the names in file 1 with names in file 2, and i am trying to get an output like the below :
Names File2Matchname. Match%
Vinay adkz. adkzVinay. 98%
Sagarbhansali. bhansali sagar 97%
Jeffery Jonas NA 0%
Kiara Francis NA 0%
Dominic Dominic 100%
Is there any logic by which the above logic can be arrived in python ?
I tried to do this in Excel but vlookup doesn't help with match%. I know this is possible with python using cousine similarity but i am unable to get the logic in which the output can be arrived.
Any help would be much appreciated.
CodePudding user response:
You can use Pandas and use python's built-in difflib library which has a function called difflib.SequenceMatcher() function to find the longest common substring between the two names.
Example code:
import pandas as pd
import difflib
#For testing
dict_lists = {"Names":["Vinay adkz", "Shailesh", "Seema", "Nidhi","Ajitesh"]}
dict_lists2 = {"Names":["Ajitesh", "Vinay adkz", "Seema", "Nid"]}
# Excel to dataframes
df1 = pd.DataFrame(dict_lists) #pd.read_excel('file1.xlsx')
df2 = pd.DataFrame(dict_lists2) #pd.read_excel('file2.xlsx')
# Empty lists to stor matched name, match percentage
match_name = []
match_percent = []
# Iterate through the first dataframe
for i, row in df1.iterrows():
name = row['Names']
match = difflib.get_close_matches(name, df2['Names'], n=1, cutoff=0.8)
if match:
match_name.append(match[0])
match_string = difflib.SequenceMatcher(None, name, match[0]).find_longest_match(0, len(name), 0, len(match[0]))
match_percentage = (match_string.size / len(name)) * 100
match_percent.append(match_percentage)
else:
match_name.append('NA')
match_percent.append(0)
df1['File2names'] = match_name
df1['Match_per'] = match_percent
print(df1)
# Write in Excel
# df1.to_excel('output.xlsx', index=False)
I hope this helps you. This is the first time I am answering a question here. Read also: How to use SequenceMatcher to find similarity between two strings?