I wanted to make a Japanese transliteration program. I won't explain the details, but some characters in pairs have different values than if they were separated, so I made a loop that gets two characters (current and next)
b = "きゃきゃ"
b = list(b)
name = ""
for i in b:
if b.index(i) 1 <= len(b) - 1:
if i in "き / キ" and b[b.index(i) 1] in "ゃ ャ":
if b[b.index(i) 1] != " ":
del b[b.index(i) 1]
del b[int(b.index(i))]
cur = "kya"
name = cur
print(name)
but it always automatically giving an index 0 to "き", so i can't check it more than once. How can i change that?
I tried to delete an element after analyzing it.... but it didn't help.
CodePudding user response:
if you are looking for the indices of 'き'
:
b = "きゃきゃ"
b = list(b)
indices = [i for i, x in enumerate(b) if x == "き"]
print(indices)
[0, 2]
CodePudding user response:
Rather than looking ahead a character, it may be easier to store a reference to the previous character, and replacing the previous transliteration if you found a combo match.
Example (I'm not sure if I got all of the transliterations correct):
COMBOS = {('き', 'ゃ'): 'kya', ('き', 'ャ'): 'kya', ('キ', 'ゃ'): 'kya', ('キ', 'ャ'): 'kya'}
TRANSLITERATIONS = {'き': 'ki', 'キ': 'ki', 'ャ': 'ya', 'ゃ': 'ya'}
def transliterate(text: str) -> str:
transliterated = []
last = None
for c in text:
try:
combo = COMBOS[(last, c)]
except KeyError:
transliterated.append(TRANSLITERATIONS.get(c, c))
else:
transliterated.pop() # remove the last value that was added
transliterated.append(combo)
last = c
return ''.join(transliterated) # combine the transliterations into a single str
That being said, rather than re-inventing the wheel, it may make more sense to use an existing library that already handles transliterating Japanese to romaji, such as Pykakasi.
Example:
>>> import pykakasi
>>> kks = pykakasi.kakasi()
>>> kks.convert('きゃ')
[{'orig': 'きゃ', 'hira': 'きゃ', 'kana': 'キャ', 'hepburn': 'kya', 'kunrei': 'kya', 'passport': 'kya'}]