Join items in list that occur before and after keyword python-CodePudding

I'm using a name entity recognition model to find names in a text string. For hyphenated names like Jane Miller-Smith, the NER model returns the names seperately like this:

names = ['Jane','Miller','-','Smith']

What's a simple way to join the items before and after the '-' to one string in this list? So that I have a list of first and last name like name = ['Jane', 'Miller-Smith']?

I've so far tried to loop through the list of names based on solutions like this for different hyphenated name versions:

name1 = ['Jane', 'Miller', '-','Smith']
name = ['Jane', '-', 'Marie','Miller', '-','Smith']

new_name = []

for cur, nxt in zip (name, name [1:]):
    print(cur,nxt)
    if cur == '-':
        hyph = cur nxt
        new_name.append(hyph)
        print("hyph: ", hyph)
    else:
        new_name.append(cur)
        print("cur: ", cur)
print(new_name)

But I can't wrap my head around how to combine only the string before and after the hypen and also keep other non-hyphenated strings in the list in order (so that not the last name is suddenly first).

CodePudding user response：

Here the trick would be to join the list with a field delimiter you won't find in your list (e.g., |).

Then, you replace the pattern |-| with - and you split back using your field delimiter.

names = ['Jane', '-', 'Marie','Miller', '-','Smith']

print('|'.join(names).replace('|-|', '-').split('|'))

Output:

['Jane-Marie', 'Miller-Smith']

CodePudding user response：

Scan from right to left, replacing the three-element slices whenever a hyphen is found:

>>> names = ['Jane', '-', 'Marie','Miller', '-','Smith']
>>> for i in reversed(range(len(names))):
        if names[i] == '-':
            names[i-1: i 2] = [f'{names[i-1]}-{names[i 1]}']

>>> names
['Jane-Marie', 'Miller-Smith']

An alternative is to loop left-to-right and build a new result list:

>>> names = ['Jane', '-', 'Marie', 'Miller', '-','Smith']
>>> result = []
>>> it = iter(names)
>>> for tok in it:
        if tok == '-':
            tok = result.pop()   '-'   next(it)
        result.append(tok)

>>> names
['Jane', '-', 'Marie', 'Miller', '-', 'Smith']

CodePudding user response：

Using an iterator and itertools:

from itertools import chain, pairwise
# for python <3.10, check the pairwise recipe:
# https://docs.python.org/3/library/itertools.html#itertools.pairwise
# or iterator = zip(names, names[1:] [''])

names = ['Jane', '-', 'Marie', 'John', 'Miller', '-','Smith']

out = []
iterator = pairwise(chain(names, ['']))
for (a, b) in iterator:
    if b == '-':
        out.append(a next(iterator)[0] next(iterator)[0])
    else:
        out.append(a)
        
out

compact version:

iterator = pairwise(chain(names, ['']))

out = [a next(iterator)[0] next(iterator)[0] if b == '-' else a
       for (a, b) in iterator]

output: ['Jane-Marie', 'John', 'Miller-Smith']

CodePudding user response：

YOu need to keep a stack, and keep check the - symbol, if found then you need to join the previous word and next word into one

name = ['Jane', '-', 'Marie','Miller', '-','Smith']

result = []
for word in name:
    if result and result[-1] !='-':
        result.append(word)
    else:
        symbol = ''
        if result:
            symbol = result.pop()
        word2 = ''
        if result:
            word2 = result.pop()
        new_word = ''.join([word2, symbol, word])
        result.append(new_word)
print(result)

output

['Jane-Marie', 'Miller-Smith']

CodePudding user response：

def solution(the_list: list[str]) -> list[str]:
    while '-' in the_list:
        hyphen_index = the_list.index('-')
        text_before_hyphen = the_list[hyphen_index - 1]
        text_after_hyphen = the_list[hyphen_index   1]
        the_list.remove(text_before_hyphen)
        the_list.remove('-')
        the_list.remove(text_after_hyphen)
        x = text_before_hyphen   '-'   text_after_hyphen
        the_list.insert(hyphen_index - 1, x)
    return the_list


print(solution(['Jane', 'Miller', '-', 'Smith']))
print(solution(['Jane', '-', 'Marie', 'Miller', '-', 'Smith']))

The output will be like this.

python3 main.py 
['Jane', 'Miller-Smith']
['Jane-Marie', 'Miller-Smith']