I need to add a comma after a vowel or space in a string and append it to an array. I tried this but does not give me what I want. For e.g if I enter "a mambo jambo" output should be ['a', 'ma', 'mbo', 'ja', 'mbo'] .
This is my code:
text = input("Please enter text : ")
vowel = ["a", "e", "i", "o", "u"]
final_string = []
not_vowel = ""
a_vowel = ""
for text_in in text:
if text_in not in vowel:
not_vowel = text_in
if text_in in vowel:
a_vowel = text_in
final_string.append(f'{not_vowel}{a_vowel}')
print(final_string)
And this is the output after input "a mambo jambo":
['a', ' a', 'ma', 'ma', 'ma', 'ba', 'bo', ' o', 'jo', 'ja', 'ma', 'ba', 'bo']
what I want is
['a', 'ma', 'mbo', 'ja', 'mbo']
CodePudding user response:
Far away from optimal, but I hope it help:
text = "a mambo jambo"
vowels = ["a", "e", "i", "o", "u"]
out_text = text
for vowel in vowels:
out_text = f'{vowel}|'.join(out_text.split(vowel))
out_text = out_text if out_text[-1] != '|' else out_text[:-1]
print(out_text.replace(" ", '').split('|'))
OUTPUT:
['a', 'ma', 'mbo', 'ja', 'mbo']
If it works don't forget accept answer.
CodePudding user response:
Peter Trcka provides an interesting answer, but here is another approach. This isn't necessarily better, but it may be more clear.
s = "a mambo jambo"
vowels = ["a", "e", "i", "o", "u"]
cur = ""
new = []
for c in s:
if c == " ":
if cur != "":
new.append(cur)
cur = ""
elif c in vowels:
cur = c
new.append(cur)
cur = ""
else:
cur = c
print(new)
Here is another method. It is a bit slower.
s = "a mambo jambo"
vowels = ["a", "e", "i", "o", "u"]
new = []
i = 0
for c in s:
if len(new) == i:
if c == " ":
continue
new.append("")
new[i] = c if c != " " else ""
if c in vowels [" "]:
i = 1
print(new)
For posterity I converter both of these, Peter Trcka's answer, and JonSG's answer to functions, and ran them through timeit
. The results are:
- Peter Trcka's answer: 1.96 (faster)
- my first method: 2.15
- my second method: 4.24
- JonSG's answer: 8.9 (slowest, often an issue with regular expressions including backreferences)
CodePudding user response:
While the accepted answer is perfectly fine, I think I would use re
regex and a list comprehension as I believe it provides an easier solution to understand.
import re
def to_sylables(text):
match_pattern = r"([aeiou ])"
replace_pattern = r"\1\t" ## replace the match with itself and a tab
return [
x for x
in re.sub(match_pattern, replace_pattern, text).split("\t")
if x.strip()
]
text = "a mambo jambo"
print(f"\"{text}\" ==> {to_sylables(text)}")
This will give you:
"a mambo jambo" ==> ['a', 'ma', 'mbo', 'ja', 'mbo']
@theherk has timed the various answers to this point and I thought it might be interesting to look at how that is done. My primary motivation for this is that they report my answer is slowest by a wide margin and while I was not expecting to "win" a speed race with regex, I am surprised that it would result in such a large slowdown.
The good news (for me) is that while my answer is still the slowest, it is not as slow as was reported. I believe that @theherk may be including the cost of import re
when timing and that might or might not be fair.
If you want to run a timeit of the various answers, try:
import timeit
setup_jonsg = '''
import re
text = "a mambo jambo"
def to_sylables(text):
match_pattern = r"([aeiou ])"
replace_pattern = r"\1\t"
return [
x for x
in re.sub(match_pattern, replace_pattern, text).split("\t")
if x.strip()
]
'''
setup_tricka = '''
text = "a mambo jambo"
def to_sylables(text):
vowels = ["a", "e", "i", "o", "u"]
out_text = text
for vowel in vowels:
out_text = f'{vowel}|'.join(out_text.split(vowel))
out_text = out_text if out_text[-1] != '|' else out_text[:-1]
return out_text
'''
setup_theherk = '''
text = "a mambo jambo"
def to_sylables(text):
vowels = ["a", "e", "i", "o", "u"]
cur = ""
new = []
for c in text:
if c == " ":
if cur != "":
new.append(cur)
cur = ""
elif c in vowels:
cur = c
new.append(cur)
cur = ""
else:
cur = c
return new
'''
print(f"jonsg: {timeit.timeit('to_sylables(text)', setup=setup_jonsg, number=1_000_000):.2f}")
print(f"tricka: {timeit.timeit('to_sylables(text)', setup=setup_tricka, number=1_000_000):.2f}")
print(f"theherk: {timeit.timeit('to_sylables(text)', setup=setup_theherk, number=1_000_000):.2f}")
For me, this reports results like:
jonsg: 2.57
tricka: 1.33
theherk: 2.05
So I'm still the slowest, but I would argue that one should implement the easiest solution to understand (which might not be mine) prior to optimizing for what might be negligible performance gains.