import re
def one_day_or_another_day_relative_to_a_date_func(input_text):
#print(repr(input_text)) #print what you have captured, and you should replace
return "aaaaaaaa"
def identify(input_text):
some_text = r"(?:(?!\.\s*?\n)[^;])*"
date_capture_pattern = r"([12]\d{3}-[01]\d-[0-3]\d)(\D*?)"
previous_days = r"(\d )\s*(?:dias|dia)\s*(?:antes|previos|previo|antes|atrás|atras)\s*"
after_days = r"(\d )\s*(?:dias|dia)\s*(?:después|despues|luego)\s*"
n_patterns = [
previous_days r"(?:del|de\s*el|de|al|a)\s*" some_text date_capture_pattern some_text r"\s*(?:,\s*o|o)\s*" previous_days,
after_days r"(?:del|de\s*el|de|al|a)\s*" some_text date_capture_pattern some_text r"\s*(?:,\s*o|o)\s*" previous_days,
previous_days r"(?:del|de\s*el|de|al|a)\s*" some_text date_capture_pattern some_text r"\s*(?:,\s*o|o)\s*" after_days,
after_days r"(?:del|de\s*el|de|al|a)\s*" some_text date_capture_pattern some_text r"\s*(?:,\s*o|o)\s*" after_days]
#Itero la lista de patrones de búsqueda para que el programa intente con uno por uno
for n_pattern in n_patterns:
#Este es mi intento de realizar el reemplazo, aunque tiene problemas con modificadores non-greedy
input_text = re.sub(n_pattern, one_day_or_another_day_relative_to_a_date_func , input_text, re.IGNORECASE)
input_texts = ["8 dias antes o 9 dias antes del 2022-12-22",
"2 dias despues o 1 dia antes del 2022-12-22, dia en donde ocurrio",
"a tan solo 2 dias despues de 2022-12-22 o a caso eran 6 dias despues, mmm no recuerdo bien",
]
#Testing...
for input_text in input_texts:
#print(input_text)
print(one_day_or_another_day_relative_to_a_date_func(input_text))
Incorrect output that I am getting, because if I incorrectly capture the substrings, the replacements will also be incorrect
"aaaaaaaa"
"aaaaaaaa"
"aaaaaaaa"
Having well-defined limits, I don't understand why this capture pattern try to capture beyond them?
And the output that I need is that:
"aaaaaaaa"
"aaaaaaaa, dia en donde ocurrio"
"a tan solo aaaaaaaa, mmm no recuerdo bien"
CodePudding user response:
There are several errors in your code, among which:
- You are printing the result of the
one_day_or_another_day_relative_to_a_date_func
function. Print the result ofidentify
instead. - In the
identify
function you are not returning the result text. Addreturn input_text
at the end of it. - Make the "o..." suffix optional.
- Use regex alternation instead of multiple patterns, otherwise you may get unexpected results.
Fixed code (I've also made it more compact):
import re
def one_day_or_another_day_relative_to_a_date_func(input_text):
#print(repr(input_text)) #print what you have captured, and you should replace
return "aaaaaaaa"
def identify(input_text):
some_text = r"(?:(?!\.\s*?\n)[^;])*"
date_capture_pattern = r"([12]\d{3}-[01]\d-[0-3]\d)(\D*?)"
previous_days = r"antes|previos|previo|antes|atrás|atras"
after_days = r"después|despues|luego"
prev_or_after = r"(\d )\s*(?:dias|dia)\s*(?:" previous_days "|" after_days ")\s*"
preposition = r"(?:del|de\s*el|de|al|a)\s*"
suffix = "(?:" r"\s*(?:,\s*o|o)\s*" some_text prev_or_after ")?"
pattern = prev_or_after some_text preposition date_capture_pattern suffix
input_text = re.sub(pattern, one_day_or_another_day_relative_to_a_date_func , input_text, re.IGNORECASE)
return input_text
input_texts = ["8 dias antes o 9 dias antes del 2022-12-22",
"2 dias despues o 1 dia antes del 2022-12-22, dia en donde ocurrio",
"a tan solo 2 dias despues de 2022-12-22 o a caso eran 6 dias despues, mmm no recuerdo bien",
]
#Testing...
for input_text in input_texts:
#print(input_text)
print(identify(input_text))
Result:
aaaaaaaa
aaaaaaaa, dia en donde ocurrio
a tan solo aaaaaaaa, mmm no recuerdo bien