Home > other >  Create a regular expression that continues to extract information if and only if the following patte
Create a regular expression that continues to extract information if and only if the following patte

Time:07-14

import re

def fun(x):
    match=re.search(r"(?<=hay que) ([\w\s,] ) ([\w\s] )",x)
    if match:
        for i in re.split(',|y',match[1]):
            with open(f'{i}.txt','w') as file:
                file.write(match[2])


input_text = str(input())
fun(input_text)

I need to create a regular expression that continues to match only if there is a comma , or y after the last one. Extracting the words and creating a text file as indicated in the following examples. And in case these words are followed by , or y, continue extracting. Then the end of the sentence must be written on one line of each of the .txt files created.

I was having trouble with sentences like for example:

hay que pintar y decorar las paredes de ese lugar

generate: pintar.txt, decorar las paredes de ese.txt

write inside each of them: lugar

but it should be:

generate: pintar.txt, decorar.txt

write inside each of them: las paredes de ese lugar


Other Examples...

input_sense: hay que pintar y decorar las paredes y los techos

generate: pintar.txt, decorar.txt

write inside each of them: las paredes y los techos


input_sense: hay que correr, saltar y cantar para llegar alli

generate: correr.txt , saltar.txt, cantar.txt

write inside each of them: para llegar alli


input_sense: yo creo que hay que saltar y correr para ir a ese lugar

generate: saltar.txt, correr.txt

write inside each of them: para ir a ese lugar


IMPORTANT: And in case the words listed begin with no ser|ser|no

input_sense: hay que esconderse y ser silenciosos para no ser descubiertos

generate: esconderse.txt, ser_silenciosos.txt

write inside each of them: para no ser descubiertos


input_sense: hay que trabajar, escalar y no temer si quieres llegar a la meta

generate: trabajar.txt, escalar.txt, no_temer.txt

write inside each of them: si quieres llegar a la meta


CodePudding user response:

I would suggest first find what should be writen:

x = 'hay que trabajar, escalar y no temer si quieres llegar a la meta'
s = re.search(r'( las .*)|( los .*)|( para .*)|( si .*)', x)
content = s.group(0)

Then remove what you not want from you string:

x = x.replace('hay que','')
x = x.replace(content, '')
x.strip()

Replace special words with underscore

x = x.replace(' no ', ' no_')
x = x.replace(' ser ', ' ser_')

finally split your files names

filenames = [f'{name}.txt' for name in re.split(',| y ', x)]
  • Related