How to separate a string with commas after vowel or space and append to an array?-CodePudding

I need to add a comma after a vowel or space in a string and append it to an array. I tried this but does not give me what I want. For e.g if I enter "a mambo jambo" output should be ['a', 'ma', 'mbo', 'ja', 'mbo'] .

This is my code:

text = input("Please enter text : ")
vowel = ["a", "e", "i", "o", "u"]
final_string = []
not_vowel = ""
a_vowel = ""
for text_in in text:
    if text_in not in vowel:
        not_vowel = text_in
    if text_in in vowel:
        a_vowel = text_in
    final_string.append(f'{not_vowel}{a_vowel}')

print(final_string)

And this is the output after input "a mambo jambo":

['a', ' a', 'ma', 'ma', 'ma', 'ba', 'bo', ' o', 'jo', 'ja', 'ma', 'ba', 'bo']

what I want is

['a', 'ma', 'mbo', 'ja', 'mbo']

CodePudding user response：

Far away from optimal, but I hope it help:

text = "a mambo jambo"
vowels = ["a", "e", "i", "o", "u"]

out_text = text
for vowel in vowels:
    out_text = f'{vowel}|'.join(out_text.split(vowel))

out_text = out_text if out_text[-1] != '|' else out_text[:-1]
print(out_text.replace(" ", '').split('|'))

OUTPUT:

['a', 'ma', 'mbo', 'ja', 'mbo']

If it works don't forget accept answer.

CodePudding user response：

Peter Trcka provides an interesting answer, but here is another approach. This isn't necessarily better, but it may be more clear.

s = "a mambo jambo"
vowels = ["a", "e", "i", "o", "u"]

cur = ""
new = []
for c in s:
    if c == " ":
        if cur != "":
            new.append(cur)
        cur = ""
    elif c in vowels:
        cur  = c
        new.append(cur)
        cur = ""
    else:
        cur  = c

print(new)

Here is another method. It is a bit slower.

s = "a mambo jambo"
vowels = ["a", "e", "i", "o", "u"]

new = []
i = 0
for c in s:
    if len(new) == i:
        if c == " ":
            continue
        new.append("")
    new[i]  = c if c != " " else ""
    if c in vowels   [" "]:
        i  = 1

print(new)

For posterity I converter both of these, Peter Trcka's answer, and JonSG's answer to functions, and ran them through timeit. The results are:

Peter Trcka's answer: 1.96 (faster)
my first method: 2.15
my second method: 4.24
JonSG's answer: 8.9 (slowest, often an issue with regular expressions including backreferences)

CodePudding user response：

While the accepted answer is perfectly fine, I think I would use re regex and a list comprehension as I believe it provides an easier solution to understand.

import re

def to_sylables(text):
    match_pattern = r"([aeiou ])"
    replace_pattern = r"\1\t"  ## replace the match with itself and a tab
    return [
        x for x
        in re.sub(match_pattern, replace_pattern, text).split("\t")
        if x.strip()
    ]

text = "a mambo jambo"
print(f"\"{text}\" ==> {to_sylables(text)}")

This will give you:

"a mambo jambo" ==> ['a', 'ma', 'mbo', 'ja', 'mbo']

@theherk has timed the various answers to this point and I thought it might be interesting to look at how that is done. My primary motivation for this is that they report my answer is slowest by a wide margin and while I was not expecting to "win" a speed race with regex, I am surprised that it would result in such a large slowdown.

The good news (for me) is that while my answer is still the slowest, it is not as slow as was reported. I believe that @theherk may be including the cost of import re when timing and that might or might not be fair.

If you want to run a timeit of the various answers, try:

import timeit

setup_jonsg = '''
import re
text = "a mambo jambo"
def to_sylables(text):
    match_pattern = r"([aeiou ])"
    replace_pattern = r"\1\t"
    return [
        x for x
        in re.sub(match_pattern, replace_pattern, text).split("\t")
        if x.strip()
    ]
'''

setup_tricka = '''
text = "a mambo jambo"
def to_sylables(text):
    vowels = ["a", "e", "i", "o", "u"]
    out_text = text
    for vowel in vowels:
        out_text = f'{vowel}|'.join(out_text.split(vowel))
    out_text = out_text if out_text[-1] != '|' else out_text[:-1]
    return out_text
'''

setup_theherk = '''
text = "a mambo jambo"
def to_sylables(text):
    vowels = ["a", "e", "i", "o", "u"]
    cur = ""
    new = []
    for c in text:
        if c == " ":
            if cur != "":
                new.append(cur)
            cur = ""
        elif c in vowels:
            cur  = c
            new.append(cur)
            cur = ""
        else:
            cur  = c
    return new
'''

print(f"jonsg: {timeit.timeit('to_sylables(text)', setup=setup_jonsg, number=1_000_000):.2f}")
print(f"tricka: {timeit.timeit('to_sylables(text)', setup=setup_tricka, number=1_000_000):.2f}")
print(f"theherk: {timeit.timeit('to_sylables(text)', setup=setup_theherk, number=1_000_000):.2f}")

For me, this reports results like:

jonsg: 2.57
tricka: 1.33
theherk: 2.05

So I'm still the slowest, but I would argue that one should implement the easiest solution to understand (which might not be mine) prior to optimizing for what might be negligible performance gains.