Home > Back-end >  match elements in two strings when whitespace inserted in one of them
match elements in two strings when whitespace inserted in one of them

Time:09-21

I have a large amount of pairs of strings, for example:

s1 = 'newyork city lights are yellow'
s2 = ' the city of new york is large'

I would like to write a function that gets s1 and s2 (regardless of the order) and outputs:

s1_output = 'new york city lights are yellow'
s2_output = 'the city of new york is large'

such that the newyork in s2 is separated into new york or at least, a way to find the element that is matching other elements in the second string with only one character insertion.

The matched tokens are not known in advance and are not mandatory in the text Any ideas?

CodePudding user response:

Something like this can work

s1 = 'newyork city lights are yellow'
s2 = ' the city of new york is large'

# Get rid of leading/trailing whitespace
s1 = s1.strip()
# Split string into list of words, delimeter is ' ' by default
words_s1 = s1.split()

s2 = s2.strip()
words_s2 = s2.split()

# For each word in list 1, compare it to adjacent (concatenated) words in list 2
for word in words_s1:
    for i in range(len(words_s2)-1):
        if word == words_s2[i]   words_s2[i 1]:
            print(f"Word #{words_s1.index(word)} in s1 matches words #{i} and #{i 1} in s2")

It works to match up words in the way you described. Basically the idea is you loop through list 1 and check it against adjacent words in list 2.

You could also then loop the opposite way (loop thru s2 and check if it's equal to adjacent words in s1), to check both directions.

You'd need to keep track of where the matches are, and then you just need to build a new string with that info.

  • Related