Home > database >  How to accumulate the modifications made on a string in each one of the iterations of a for loop?
How to accumulate the modifications made on a string in each one of the iterations of a for loop?

Time:01-23

import re

input_text = "Acá festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban abajo de su manga." #example 1

list_all_adverbs_of_place = ["aquí", "aqui", "acá" , "aca", "abajo", "bajo", "alrededor", "al rededor"]
place_reference = r"((?i:\w\s*) )?" #capturing group for an alphanumeric string with upper or lower case

for place_adverb in list_all_adverbs_of_place:
    pattern = r"("   place_adverb   r")\s (?i:del|de)\s "   place_reference   r"\s*(?:[.\n;,]|$)"

    input_text = re.sub(pattern, 
                        lambda m: f"((PL_ADVB='{m[2] or ''}'){m[1]})", 
                        input_text, re.IGNORECASE)

print(repr(input_text)) # --> OUTPUT

How to make the input_text variable not be reset in each iteration of the for loop, so that the changes made by the re.sub() function in one iteration are kept for the following iterations of the loop?

"((PL_ADVB='')Acá) festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban ((PL_ADVB='su manga')abajo)." #for example 1

enter image description here

CodePudding user response:

What makes you think input_text is being reset after each iteration? Your example seems to be working as it should. Actually, a loop is unnecessary here. You can build your pattern by joining the list elements with |:

import re

input_text = "Acá festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban abajo de su manga." #example 1

list_all_adverbs_of_place = ["aquí", "aqui", "acá" , "aca", "abajo", "bajo", "alrededor", "al rededor"]
place_reference = r"((?i:\w\s*) )?" #capturing group for an alphanumeric string with upper or lower case

pattern = re.compile(rf"({'|'.join(list_all_adverbs_of_place)})\s (?i:del|de)\s {place_reference}\s*(?:[.\n;,]|$)", re.IGNORECASE)

input_text = re.sub(pattern, 
                    r"((PL_ADVB='\2')\1)", 
                    input_text)

print(input_text)

Edit: re.IGNORECASE belongs in the pattern

Output:

Acá festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban ((PL_ADVB='su manga')abajo)

Note: Acá is not captured because it is not followed be del/de.

Edit2: in order to get the desired result, all your groups could be capturing and you could check if #2 (ie. del/de) is empty in the lambda function:

pattern = re.compile(rf"({'|'.join(list_all_adverbs_of_place)})(\s (?:del|de))?\s {place_reference}\s*([.\n;,]|$)", re.IGNORECASE)

input_text = re.sub(pattern, 
                    lambda m: f"((PL_ADVB='{m[3]}'){m[1]}){m[4]}" if m[2] else f"((PL_ADVB=''){m[1]}) {m[3]   m[4]}", 
                    input_text)

Output:

((PL_ADVB='')Acá) festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban ((PL_ADVB='su manga')abajo).
  • Related