Compare two strings and increase the count of matching string in python-CodePudding

Consider below data

diction = {'A_B_D_E_F':0,
          'B_C_E':0,
          'A_D_E':0}

string = 'A_E_B_F'

I want to increase the count of key in 'diction' where the 'string' is matching in maximum percentage. In this case the count of 'A_B_D_E_F' should be increased to one. Here if we ignore 'E' from string it will majorly match with 'A_B_D_E_F' Note: The order of string should be main factor

I looked at String similarity in Python but I am not sure if those consider the order of string content.

New example:

diction = {'HEADER_Switchprofileoptionclicked__HEADER_profilechangedto:PROD_CE_VIEWER__HEADER_profilechangedto:PROD_CE_ADMIN__HEADER_Switchprofilesubmitbuttonclicked__CDPR_PageLoad__HEADER_Navigatedto:ShortfallAutomationDashboard': 0,
 'HEADER_Switchprofileoptionclicked__HEADER_profilechangedto:PROD_CE_VIEWER__HEADER_profilechangedto:PROD_CE_ADMIN__HEADER_Switchprofilepop-upclosed__CDPR_PageLoad__HEADER_Navigatedto:ShortfallAutomationDashboard': 0,
 'HEADER_Switchprofileoptionclicked__HEADER_profilechangedto:PROD_CE_ADMIN__HEADER_Switchprofilesubmitbuttonclicked__CDPR_PageLoad__HEADER_Navigatedto:ShortfallAutomationDashboard': 0,
 'HEADER_Switchprofileoptionclicked__HEADER_profilechangedto:PROD_CE_ADMIN__HEADER_Switchprofilepop-upclosed__CDPR_PageLoad__HEADER_Navigatedto:ShortfallAutomationDashboard': 0}


string = 'HEADER_Switchprofileoptionclicked___HEADER_profilechangedto:PROD_CE_VIEWER___HEADER_profilechangedto:PROD_CE_ADMIN___HEADER_Switchprofilesubmitbuttonclicked___HEADER_Switchprofilepop-upclosed___CDPR_PageLoad___HEADER_Switchprofileoptionclicked___HEADER_profilechangedto:PROD_CE_ADMIN___HEADER_Switchprofilepop-upclosed___HEADER_Switchprofilesubmitbuttonclicked___CDPR_PageLoad___HEADER_Navigatedto:ShortfallAutomationDashboard'

CodePudding user response：

Try a dictionary comprehension:

>>> {k: (v   1 if all(x in k.split('_') for x in string.split('_')) else v) for k, v in diction.items()}
{'A_B_D_E_F': 1, 'B_C_E': 0, 'A_D_E': 0}
>>>

Or better assign first:

>>> lst = string.split('_')
>>> {k: (v   1 if all(x in k.split('_') for x in lst) else v) for k, v in diction.items()}
{'A_B_D_E_F': 1, 'B_C_E': 0, 'A_D_E': 0}
>>>

How this works is that it adds 1 to the value if the all of the characters in string: A, E, B and F are in the key name, if so, it adds 1 to the value, if not it keeps it 0.

CodePudding user response：

You can use the difflib to get the closest match

import difflib

diction = { 'A_B_D_E_F':0,
            'B_C_E':0,
            'A_D_E':0
          }

string = 'A_E_B_F'

diction[difflib.get_close_matches(string, diction.keys())[0]]  = 1

print(diction)

Gives output

{'A_B_D_E_F': 1, 'B_C_E': 0, 'A_D_E': 0}

CodePudding user response：

As you said you should use levenshtein_distance and yes it consider the order of strings. You can do this:

import jellyfish

min_dist = len(string)
closest_string = ""

for s in diction.keys():
    dist = jellyfish.levenshtein_distance(string, s)
    if dist < min_dist:
        min_dist = dist
        closest_string = s

diction[closest_string]  = 1