Home > Mobile >  Is there a way to split a string by multiple different strings?
Is there a way to split a string by multiple different strings?

Time:09-26

I a trying to make a translator to a custom language, and at the moment I am going towards the point where I can type the man sits with the woman and receive the output de mno di felio colten aili, word for word being the man the woman sits with.

When the translator gets to a verb it adds it to verbo and a preposition to prepo2 (it's prepo2 not prepo for other reasons). After it finishes translating word for word, it splits it by all translations of the (de and di) and the verb, previously defined verbo. It then should put the verb on the end and go on to run for a preposition instead of a verb, and then put that on the end.

When I enter in the man sits with the woman I get de mno di felio aili with no colten, when I enter the man sits the woman (yes I know not a correct sentence but it's the verb I'm testing with) I just get de mno colten di felio and when I enter the man is with the woman I get de mno aili di felio.

Trying to get this done quickly so an answer would be appreciated :)

import string
import re

NounM = {
    'man': 'mno',
    'rock': 'lehr',
    'dog': 'krua'
}

NounF = {
    'woman': 'felio',
    'chair': 'poen',
    'cat': 'keile'
}

Verb = {
    'sit': 'colt',
    'sing': 'alet'
}

Preposition = {
    'on': 'mit',
    'with': 'ail',
    'at': 'zal'
}

Pronoun = {
    'he': 'tse',
    'she': 'se',
    'i': 'ile',
    'me': 'men',
    'they': 'er',
    'it': 'ze',
    'you': 'jü'
}

Adjective = {
    'happy': 'kliony',
    'sad': 'probo',
    'good': 'klio',
    'bad': 'pro'
}

Article = {
    'that': 'arei',
    'those': 'sie'
}

Question = {
    'who': 'nej',
    'what': 'kär',
    'when': 'woin',
    'where': 'ten',
    'why': 'apr'
}

Skip = ('is', 'are', 'am')

Preposition_Replace = ('aile', 'aili', 'mite', 'miti')

Verb_Replace = ('colten', 'aleten')

def translate(j=None):
    sentence = input('Enter the sentence to turn into your custom language! ')
    split = sentence.split()
    translated_list = []
    translated_sentence = ''

    for index, word in enumerate(split):
        char = ''
        for a in string.punctuation:
            if str(a) in word:
                char = a
        if word in NounM:
            translated_sentence  = NounM[word]
        elif word in NounF:
            translated_sentence  = NounF[word]
        elif word in Verb:
            translated_sentence  = Verb[word]
            verbo = Verb[word]
        elif word in Preposition:
            translated_sentence  = Preposition[word]
            prepo = Preposition[word]
            try:
                c = split[index   1]
                while c not in NounM and c not in NounF:
                    a = 2
                    c = split[index   a]
                    a  = 1
                if c in NounM:
                    translated_sentence  = 'e'
                    prepo2 = prepo   'e'
                elif c in NounF:
                    translated_sentence  = 'i'
                    prepo2 = prepo   'i'
            except IndexError:
                pass
        elif word in Pronoun:
            translated_sentence  = Pronoun[word]
        elif word in Adjective:
            translated_sentence  = Adjective[word]
        elif word in Article:
            translated_sentence  = Article[word]
        elif word in Question:
            translated_sentence  = Question[word]
        elif word == 'the':
            c = split[index   1]
            while c not in NounM and c not in NounF:
                a = 2
                c = split[index   a]
                a  = 1
            if c in NounM:
                translated_sentence  = 'de'
            elif c in NounF:
                translated_sentence  = 'di'
            else:
                pass
        elif word == 'a':
            c = split[index   1]
            while c not in NounM and c not in NounF:
                    a = 2
                    c = split[index   a]
                    a  = 1
            if c in NounM:
                translated_sentence  = 'es'
            elif c in NounF:
                translated_sentence  = 'en'
            else:
                pass
        elif word in Skip:
            c = split[index   1]
            if c == 'not':
                split.remove('not')
                translated_sentence  = 'nen'
            else:
                pass
        elif word[len(word) - 1] == 's':
            word = word[:-1]
            if word in Verb:
                translated_sentence  = Verb[word]
                translated_sentence  = 'en'
                verbo = Verb[word]   'en'
            else:
                pass
        else:
            translated_sentence  = word
        word  = str(char)
        for i in translated_sentence:
            translated_list  = i
        translated_list  = str(char)
        if word == 'is' or word == 'are' or word == 'am':
            if c == 'not':
                translated_list  = ' '
            else:
                pass
        else:
            translated_list  = ' '
        translated_sentence = ''

    leng = len(translated_list) - 2
    final = translated_list[leng]
    if final in string.punctuation:
        translated_list.remove(final)

    translated_sentence = ''
    for i in translated_list:
        translated_sentence  = i

    if final in string.punctuation:
        translated_sentence  = final

    try:
        new_sentence2 = ''
        new_sentence = re.split('(?=de |di )' and ' ' verbo, translated_sentence)
        print(new_sentence)
        for i in new_sentence:
            new_sentence2  = i
        print(new_sentence2)
        new_sentence  = verbo
        new_sentence = re.split('(?=de |di )' and ' ' prepo2, new_sentence2)
        print(new_sentence)
        new_sentence.append(' ' prepo2)
        print(new_sentence)
        translated_sentence = ''
        for i in new_sentence:
            translated_sentence  = i
    except:
        pass

    print(translated_sentence)
    other_translate = input('Would you like to translate another sentence? y/n ')
    if other_translate == 'y':
        translate()


translate()

Main code for the splitting part:

try:
        new_sentence2 = ''
        new_sentence = re.split('(?=de |di )' and ' ' verbo, translated_sentence)
        print(new_sentence)
        for i in new_sentence:
            new_sentence2  = i
        print(new_sentence2)
        new_sentence  = verbo
        new_sentence = re.split('(?=de |di )' and ' ' prepo2, new_sentence2)
        print(new_sentence)
        new_sentence.append(' ' prepo2)
        print(new_sentence)
        translated_sentence = ''
        for i in new_sentence:
            translated_sentence  = i
    except:
        pass

CodePudding user response:

Wow, there's a lot going on in that function. My main advice would be to break things up into smaller functions. It'll make testing it a lot easier. Below is an example of how I would start building things up.

I know this solution is not complete and it does not address the reordering of the sentence but it is a lot more readable and easy to make changes to. Consider restarting your code. It's an interesting idea.

import string
import re

NounM = {
    'man': 'mno',
    'rock': 'lehr',
    'dog': 'krua'
}

NounF = {
    'woman': 'felio',
    'chair': 'poen',
    'cat': 'keile'
}

Verb = {
    'sit': 'colt',
    'sing': 'alet'
}

Preposition = {
    'on': 'mit',
    'with': 'ail',
    'at': 'zal'
}

Pronoun = {
    'he': 'tse',
    'she': 'se',
    'i': 'ile',
    'me': 'men',
    'they': 'er',
    'it': 'ze',
    'you': 'jü'
}

Adjective = {
    'happy': 'kliony',
    'sad': 'probo',
    'good': 'klio',
    'bad': 'pro'
}

Article = {
    'that': 'arei',
    'those': 'sie'
}

Question = {
    'who': 'nej',
    'what': 'kär',
    'when': 'woin',
    'where': 'ten',
    'why': 'apr'
}

Skip = ('is', 'are', 'am')

Preposition_Replace = ('aile', 'aili', 'mite', 'miti')

Verb_Replace = ('colten', 'aleten')

gender = ''

def translate_word(word):
    global gender
    word, punctuation = check_punctuation(word)
    word = check_plurality(word)
    if word in NounM:
        gender = 'm'
        return f"de {NounM[word]}{punctuation}"
    elif word in NounF:
        gender = 'f'
        return f"de {NounF[word]}{punctuation}"
    elif word in Verb:
        ending = get_gendered_verb_ending()
        return f"{Verb[word]}{ending}{punctuation}"
    elif word in Preposition:
        return f"{Preposition[word]}{punctuation}"
    else:
        return

def check_punctuation(word):
    for p in string.punctuation:
        if str(p) in word:
            return word.replace(str(p), ''), str(p)
        else:
            return word, ''

def get_gendered_verb_ending():
    global gender
    if gender == 'm':
        return 'es'
    elif gender == 'f':
        return 'en'
    else:
        return ''

def check_plurality(word):
    print(word)
    if word[-1] == 's':
        return word[:-1]
    else:
        return word


def translate():
    sentence = input('Enter the sentence to turn into your custom language! ')
    split = sentence.split()
    translated = " ".join([translate_word(word) for word in split if translate_word(word) != None])
    print(translated)

translate()

CodePudding user response:

I would suggest creating a class for words where you can hold a bunch of semantic attributes (possibly subclassing for language parts). Building vocabularies with instances of these classes and linking them together for translation will allow you to implement generic rules that will apply to whole swats of word categories thus reducing the amount of code. Also, it will probably be easier (initially at least) to work with full word spellings rather than trying to compose words from stems, roots and morphemes.

For example (not the complete program, just a general illustration):

Base class for words / vocabularies:

class Word:
    NOUN    = "NOUN"
    VERB    = "VERB"
    PRONOUN = "PRONOUN"
    ADJ     = "ADJECTIVE"
    ADVERB  = "ADVERB"
    PREP    = "PREPOSITION"
    CONJ    = "CONJUNCTION"
    ARTICLE = "ARTICLE"

    MASCULINE = "M"
    FEMININE  = "F"
    NEUTRAL   = "N"

    SINGULAR = "S"
    DUAL     = "2"
    PLURAL   = "P"
    
    lexicon = dict()
    
    def __init__(self,spelling,part="NOUN",lang="EN",gender="N",plural="S"):
        self.part         = part
        self.spelling     = spelling.lower()
        self.language     = lang.upper()
        self.gender       = gender
        self.plural       = plural
        self.translations = set()
        vocabulary    = Word.lexicon.setdefault(self.language,dict())
        vocabulary.setdefault(self.spelling,[]).append(self)

    def noun(spelling,*args,**kwargs):         return Word(spelling,Word.NOUN,*args,**kwargs)
    def verb(spelling,*args,**kwargs):         return Word(spelling,Word.VERB,*args,**kwargs)
    def pronoun(spelling,*args,**kwargs):      return Word(spelling,Word.PRONOUN,*args,**kwargs)
    def adjective(spelling,*args,**kwargs):    return Word(spelling,Word.ADJ,*args,**kwargs)
    def adverb(spelling,*args,**kwargs):       return Word(spelling,Word.ADVERB,*args,**kwargs)
    def preposition(spelling,*args,**kwargs):  return Word(spelling,Word.PREP,*args,**kwargs)
    def conjunction(spelling,*args,**kwargs):  return Word(spelling,Word.CONJ,*args,**kwargs)
    def article(spelling,*args,**kwargs):      return Word(spelling,Word.ARTICLE,*args,**kwargs)

    def trans(self,*words,lang="CL"): # CL for Custom Language
        lang = lang.upper()
        for word in words:
            if isinstance(word,str):
                word = word.lower()
                vocabulary = Word.lexicon.setdefault(lang,dict())
                if word not in vocabulary:
                    word = Word(word,part=self.part,lang=lang,gender=self.gender,plural=self.plural)
                    self.translations.add(word)
                else:
                    self.translations.update(w for w in vocabulary[word] if w.part==self.part)
            else:
                self.translations.add(word)
        return self

    def getTrans(self,lang="CL"):
        return [w for w in self.translations if w.language==lang.upper()]

Meta-data:

Word.noun('man',gender="M",plural=1).trans('mno')
Word.noun('rock').trans(Word('lehr',lang="CL",gender="M"))
Word.noun('dog').trans(Word.noun('krua',gender="M"))
Word.noun('woman',gender="F",plural=1).trans('felio')
Word.noun('chair').trans(Word.noun('poen',lang="CL",gender="F"))
Word.noun('cat').trans(Word('keile',lang="CL",gender="F"))
Word.verb('sit').trans('colt')
Word.verb('sits').trans('colt')
Word.verb('sing').trans('alet')
Word.preposition('on').trans('mit')
Word.preposition('with').trans('ail')
Word.preposition('at').trans('zal')
Word.pronoun('he',gender="M").trans('tse')
Word.pronoun('she',gender="M").trans('se')
Word.pronoun('i').trans('ile')
Word.pronoun('me').trans('men')
Word.pronoun('they',plural="P").trans('er')
Word.pronoun('it').trans('ze')
Word.pronoun('you').trans('jü')
Word.adjective('happy').trans('kliony')
Word.adjective('sad').trans('probo')
Word.adjective('good').trans('klio')
Word.adjective('bad').trans('pro')
Word.article('the').trans('de')
Word.article('that').trans('arei')
Word.article('those').trans('sie')
Word.adverb('who').trans('nej')
Word.adverb('what').trans('kär')
Word.adverb('when').trans('woin')
Word.adverb('where').trans('ten')
Word.adverb('why').trans('apr')
          

Parsing and translation:

import re

sentence = "the man sits with the woman"

parsed = [Word.lexicon['EN'][w] for w in re.findall(r"\w ",sentence)]

word2Word = [w[0].getTrans()[0].spelling for w in parsed]
print(*word2Word)
'de mno colt ail de felio'


verbPos  = next(i for i,w in enumerate(parsed) if w[0].part == Word.VERB)

SOV = word2Word[:verbPos] word2Word[verbPos 1:] word2Word[verbPos:verbPos 1]

print(*SOV) # Subject-Object-Verb
'de mno ail de felio colt'

You can enrich the meta-data classes as needed, for example to add sub types to adverbs (e.g. questions), person to verbs, etc. More attributes will help in building generic rules that don't depend on spelling

CodePudding user response:

Since your target language doesn't have the same word order as English, translating from English needs you to parse it into a structure that keeps information about the text beyond just what words are in it.

That's called "Natural Language Processing", NLP, and there are many libraries for it. I suggest you check out those libraries to help you parse the input. The main one for Python is called NLTK, but there is also TextBlob which aims to have an easier interface.

I haven't used them enough to give any further recommendations than that.

  • Related