import re
input_text = "Acá festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban abajo de su manga." #example 1
list_all_adverbs_of_place = ["aquí", "aqui", "acá" , "aca", "abajo", "bajo", "alrededor", "al rededor"]
place_reference = r"((?i:\w\s*) )?" #capturing group for an alphanumeric string with upper or lower case
for place_adverb in list_all_adverbs_of_place:
pattern = r"(" place_adverb r")\s (?i:del|de)\s " place_reference r"\s*(?:[.\n;,]|$)"
input_text = re.sub(pattern,
lambda m: f"((PL_ADVB='{m[2] or ''}'){m[1]})",
input_text, re.IGNORECASE)
print(repr(input_text)) # --> OUTPUT
How to make the input_text
variable not be reset in each iteration of the for loop, so that the changes made by the re.sub()
function in one iteration are kept for the following iterations of the loop?
"((PL_ADVB='')Acá) festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban ((PL_ADVB='su manga')abajo)." #for example 1
CodePudding user response:
What makes you think input_text
is being reset after each iteration? Your example seems to be working as it should. Actually, a loop is unnecessary here. You can build your pattern by joining the list elements with |
:
import re
input_text = "Acá festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban abajo de su manga." #example 1
list_all_adverbs_of_place = ["aquí", "aqui", "acá" , "aca", "abajo", "bajo", "alrededor", "al rededor"]
place_reference = r"((?i:\w\s*) )?" #capturing group for an alphanumeric string with upper or lower case
pattern = re.compile(rf"({'|'.join(list_all_adverbs_of_place)})\s (?i:del|de)\s {place_reference}\s*(?:[.\n;,]|$)", re.IGNORECASE)
input_text = re.sub(pattern,
r"((PL_ADVB='\2')\1)",
input_text)
print(input_text)
Edit: re.IGNORECASE
belongs in the pattern
Output:
Acá festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban ((PL_ADVB='su manga')abajo)
Note: Acá
is not captured because it is not followed be del/de.
Edit2: in order to get the desired result, all your groups could be capturing and you could check if #2 (ie. del/de) is empty in the lambda function:
pattern = re.compile(rf"({'|'.join(list_all_adverbs_of_place)})(\s (?:del|de))?\s {place_reference}\s*([.\n;,]|$)", re.IGNORECASE)
input_text = re.sub(pattern,
lambda m: f"((PL_ADVB='{m[3]}'){m[1]}){m[4]}" if m[2] else f"((PL_ADVB=''){m[1]}) {m[3] m[4]}",
input_text)
Output:
((PL_ADVB='')Acá) festejaremos mi cumpleaños. Yo ya sabía que los naipes estaban ((PL_ADVB='su manga')abajo).