Home > Software engineering >  How can I split concatenated strings that contain no delimiters in python?
How can I split concatenated strings that contain no delimiters in python?

Time:10-17

Let's say I have a list of concatenated firstname lastname combinations like: ["samsmith","sallyfrank","jamesandrews"].

I also have lists possible_firstnames, and possible_lastnames.

If I want to split those full name strings based on values that appear in possible_firstnames and possible_lastnames, what is the best way of doing so?

My initial strategy was to compare characters between full name strings and each possible_firstnames/possible_lastnames value one by one, where I would split the full name string on discovery of a match. However, I realize that I would encounter a problem if, for example, "Sal" was included as a possible first name (my code would try to turn "sallyfrank" into "Sal Lyfrank" etc).

My next step would be to crosscheck what remains in the string after "sal" to values in possible_lastnames before finalizing the split, but this is starting to approach the convoluted and so I am left wondering if there is perhaps a much simpler option that I have been overlooking from the very beginning?

The language that I am working in is Python.

CodePudding user response:

If you are getting similar names, like sam, samantha and saman, put them in reverse order so that the shortest is last

full_names = ["samsmith","sallyfrank","jamesandrews", "samanthasang", "samantorres"]
first_name = ["sally","james", "samantha", "saman", "sam"]

matches = []

for name in full_names:
    for first in first_name:
        if name.startswith(first):
            matches.append(f'{first} {name[len(first):]}')
            break

print(*matches, sep='\n')

Result

sam smith
sally frank
james andrews
samantha sang
saman torres

This won't pick out a name like Sam Antony. It would show this as *Saman Tony", in which case, your last name idea would work.

It also won't pick out Sam Anthanei. This could be Samantha Nei, Saman Thanei or Sam Anthanei if all three surnames were in your surname list.

CodePudding user response:

Is this what u wanted

names = ["samsmith","sallyfrank","jamesandrews"]
pos_fname = ["sally","james"]
pos_lname = ["smith","frank"]

matches = []

for i in names:
    for n in pos_fname:
        if i.startswith(n):
            break
    else:
        continue
    
    for n in pos_lname:
        if i.endswith(n):
            matches.append(f"{i[:-len(n)].upper()} {n.upper()}")
            break
    else:
        continue

print(matches)

  • Related