Need help for p-language deciphering-CodePudding

I don't know if you're familiar with the P-language or if it's something that's just known in my country. Basically, everytime you come across a vowel in a word, you replace the vowel the same vowel p the vowel again.

So 'home' would be 'hopomepe' in the p-language. Now I'm tasked to decipher the p-language and turn sentences that are written in the p-language back to normal.

p = str(input())

for letter in range(1, len(p)):
    if p[letter]=='.':
        break
    if p[letter-1]==p[letter 1] and p[letter]=='p':
        p = p[:letter-1]   p[letter 1:]
print(p)

This is my code so far, it works except I don't know how to make it work for double vowel sounds like 'io' in scorpion (scoporpiopion) for example.

Also when a sentence starts with a vowel, this code doesn't work on that vowel. For example 'Apan epelepephapant' becomes 'Apan elephant' with my code.

And my code crashes with string index out of bounds when it doesn't end on '.' but it crashes everytime when I don't have that if for the '.' case.

TLDR; How can I get change my code so it works for double vowels and at the start of my sentence.

EDIT: To clarify, like in my example, combination vowels should count as 1 vowel. Scorpion would be Scoporpiopion instead of Scoporpipiopon, boat would be boapoat, boot would be boopoot, ...

CodePudding user response：

You can do it using regular expressions:

import re

def decodePLanguage(p):
    return re.subn(r'([aeiou] )p\1', r'\1', p, flags=re.IGNORECASE)[0]

In [1]: decodePLanguage('Apan epelepephapant')
Out[1]: 'An elephant'

In [2]: decodePLanguage('scoporpiopion')
Out[2]: 'scorpion'

This uses re.subn function to replace all regex matches.

In r'([aeiou] )p\1', the [aeiou] part matches several vowels in a row, and \1 ensures you have the same combination after a p.

Then r'\1' is used to replace the whole match with the first vowel group.

CodePudding user response：

EDIT: working code

def decipher(p):    
    result = ''
    while len(p) > 2:
        # first strip out all the consecutive consonants each iteration
        idx = 0
        while p[idx].lower() not in 'aeiou' and idx < len(p) - 2:
            idx  = 1
        result  = p[:idx]
        p = p[idx:]
        # if there is any string remaining to process, that starts with a vowel
        if len(p) > 2:
            idx = 0
            # scan forward until 'p'
            while p[idx].lower() != 'p':
                idx  = 1
            # sanity check
            if len(p) < (idx*2   1) or p[:idx].lower() != p[idx 1:2*idx 1].lower():
                raise ValueError
            result  = p[:idx]
            p = p[2*idx 1:]
    result  = p
    return result

In your example input 'Apan epelepephapant', you compare 'A' == 'a' and get False. It seems you want to compare 'a' == 'a', that is, the str.lower() of each.

It also seems you don't check if the character before the p and after the p is a vowel; that is, if you come across the string hph, as written, your function deciphers it to simply h.

Earlier version of code below:

def decipher(p):    
    while len(p) > 2:
        if p[0].lower() in 'aeiou' and p[0].lower() == p[2].lower() and p[1] == 'p':
            result  = p[0]
            p = p[3:]
        else:
            result  = p[0]
            p = p[1:]
    result  = p
    return result

called as e.g.

p = str(input())
print(decipher(p))

CodePudding user response：

Since @Kolmar already gave a regex solution, I'm going to add one without regex

To help think through this, I'm going to first show you my solution to encode a regular string into p-language. In this approach, I group the characters in the string by whether or not they are vowels using itertools.groupby(). This function groups consecutive elements having the same key in the same group.

def p_encode(s):
    vowels = {'a', 'e', 'i', 'o', 'u'}
    s_groups = [(k, list(v)) for k, v in itertools.groupby(s, lambda c: c.lower() in vowels)]
    # For scorpion, this will look like this:
    # [(False, ['s', 'c']),
    #  (True, ['o']),
    #  (False, ['r', 'p']),
    #  (True, ['i', 'o']),
    #  (False, ['n'])]

    p_output = []

    # Now, we go over each group and do the encoding for the vowels.
    for is_vowel_group, group_chars in s_groups:
        p_output.extend(group_chars) # Add these chars to the output
        if is_vowel_group: # Special treatment for vowel groups
            p_output.append("p")
            p_output.extend(c.lower() for c in group_chars)

    return "".join(p_output)

I added a list comprehension to define s_groups to show you how it works. You can skip the list comprehension and directly iterate for is_vowel_group, group_chars in itertools.groupby(s, lambda c: c.lower() in vowels)

Now, to decode this, we can do something similar in reverse, but this time manually do the grouping because we need to process ps differently when they're in the middle of a vowel group.

I suggest you don't modify the string as you're iterating over it. At best, you'll have written some code that's hard to understand. At worst, you'll have bugs because the loop will try to iterate over more indices than actually exist.

Also, you iterate over 1..len(p), and then try to access p[i 1]. In the last iteration this will throw an IndexError. And because you want repeated vowels to count as a single group, this doesn't work. You're going to have to group the vowels and non-vowels separately, and then join them into a single string.

def p_decode(p):
    vowels = {'a', 'e', 'i', 'o', 'u'}
    p_groups = []
    current_group = None
    for c in p:
        if current_group is not None:
            # If the 'vowelness' of the current group is the same as this character
            # or ( the current group is a vowel group 
            #      and the current character is a 'p'
            #      and the current group doesn't contain a 'p' already )
            if (c.lower() in vowels) == current_group[0] or \
               ( current_group[0] and 
                 c.lower() == 'p' and 
                 'p' not in current_group[1]):      
                current_group[1].append(c) # Add c to the current group
            else:
                current_group = None # Reset the current group to None so you can make it later

        if current_group is None:
            current_group = (c.lower() in vowels, [c]) # Make the current group
            p_groups.append(current_group) # Append it to the list

    # For scorpion => scoporpiopion
    # p_groups looks like this:
    # [(False, ['s', 'c']), 
    #  (True, ['o', 'p', 'o']), 
    #  (False, ['r', 'p']), 
    #  (True, ['i', 'o', 'p', 'i', 'o']), 
    #  (False, ['n'])]

    p_output = []
    for is_vowel_group, group_chars in p_groups:
        if is_vowel_group:
            h1 = group_chars[:len(group_chars)//2] # First half of the group
            h2 = group_chars[-len(group_chars)//2 1:] # Second half of the group, excluding the p

            # Add the first half to the output
            p_output.extend(h1)

            if h1 != h2:
                # The second half of this group is not repeated characters
                # so something in the input was wrong!
                raise ValueError(f"Invalid input '{p}' to p_decode(): vowels before and after 'p' are not the same in group '{''.join(group_chars)}'")

        else:
            # Add all chars in non-vowel groups to the output
            p_output.extend(group_chars)
    
    return "".join(p_output)

And now, we have:

words = ["An elephant", "scorpion", "boat", "boot", "Hello World", "stupid"]
for w in words:
    p = p_encode(w)
    d = p_decode(p)
    print(w, p, d, sep=" | ")

Which gives (prettification mine):

Word	Encoded	Decoded
An elephant	Apan epelepephapant	An elephant
scorpion	scoporpiopion	scorpion
boat	boapoat	boat
boot	boopoot	boot
Hello World	Hepellopo Woporld	Hello World
stupid	stupupipid	stupid

Also, words that aren't actually encoded correctly (such as "stupid") throw a ValueError

>>> p_decode("stupid")

ValueError: Invalid input 'stupid' to p_decode(): vowels before and after 'p' are not the same in group 'upi'