Home > Software engineering >  python compare strings return difference
python compare strings return difference

Time:12-28

Consider this sample data:

str_lst = ['abcdefg','abcdefghi']

I am trying to write a function that will compare these two strings in this list and return the difference, in this case, 'hi'

This attempt failed and simply returned both strings.

def difference(string1, string2):
    # Split both strings into list items
    string1 = string1.split()
    string2 = string2.split()

    A = set(string1) # Store all string1 list items in set A
    B = set(string2) # Store all string2 list items in set B
 
    str_diff = A.symmetric_difference(B)
    # isEmpty = (len(str_diff) == 0)
    return str_diff

There are several SO questions claiming to seek this, but they simply return a list of the letters that differ between two strings where, in my case, the strings will have many characters identical at the start and I only want the characters near the end that differ between the two.

Ideas of how to reliably accomplish this? My exact situation would be a list of very similar strings, let's say 10 of them, in which I want to use the first item in the list and compare it against all the others one after the other, placing those differences (i.e. small substrings) into a list for collection.

I appreciate you taking the time to check out my question.

Some hypos:

The strings in my dataset would all have initial characters identical, think, directory paths:

sample_lst = ['c:/universal/bin/library/file_choice1.zip', 
'c:/universal/bin/library/file_zebra1.doc',
'c:/universal/bin/library/file_alpha1.xlsx']

Running the ideal function on this list would yield a list with the following strings:

result = ['choice1.zip', 'zebra1.doc', 'alpha1.xlsx']

Thus, these are the strings that remaining when you remove any duplicate characters at the start of all of the three lists items in sample_lst

CodePudding user response:

OK, after you have provided some additional clarification I think I understand what you are looking for. Thanks. So I think that I would break the problem into two steps :

  1. Find the longest common initial substring of your input strings
  2. Find the "remainder" of each string after removing the longest common initial substring.
def updateCommonInitialSubstring(s,t):
    nchars = min(len(s), len(t))
    for i in range(nchars):
        if s[i] != t[i]:
            return s[:i]
    return s[:nchars]
    
def findLongestCommonInitialSubstring(strings):
    common = strings[0]
    for s in strings[1:]:
        common = updateCommonInitialSubstring(common, s)
    return common 

def findDistinctSuffixes(strings):
    # Find the longest common substring of the strings.
    common = findLongestCommonInitialSubstring(strings)
    # Find the distinct suffixes of the common substring.
    suffixes = []
    for s in strings:
        suffixes.append(s[len(common):])
    return suffixes

With this implementation the following test case passes:

def test_findDistinctSuffixes2():
    # Arrange
    sample_lst = ['c:/universal/bin/library/file_choice1.zip', 
                  'c:/universal/bin/library/file_zebra1.doc',
                  'c:/universal/bin/library/file_alpha1.xlsx']    
    # Act
    result = findDistinctSuffixes(sample_lst)
    # Assert
    assert result == ['choice1.zip', 'zebra1.doc', 'alpha1.xlsx']

which is what I think that you were looking for.

CodePudding user response:

simple solution,

if(len(s) < len(t)):
   s, t = t, s
   diff = ""
   for i in s:
      if(i not in t):
         diff  = i
   return diff

Complex solution

s_hash = {}
t_hash = {}
for i in s:
   if(i not in s_hash):
      s_hash[i] = 1
   else:
      s_hash[i]  = 1    
for i in t:
   if(i not in t_hash):
      t_hash[i] = 1
   else:
      t_hash[i]  = 1
for i in t_hash:
   if(i in s_hash and (s_hash[i] != t_hash[i])):
      return i
   elif(i not in s_hash):
      return i
  • Related