import re
input_text = "05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
input_text = "04 del 05 del 07 del 2000" #example 1 - Not modify!
input_text = "04 05 del 06 de 200" #example 2 - Yes modify!
input_text = "04 05 del 06 de 20076 55" #example 3 - Yes modify!
detection_regex_obligatory_preposition = r"\d{2}" r"[\s|](?:del|de[\s|]el|de )[\s|]" r"\d{2}" r"[\s|](?:del|de[\s|]el|de )[\s|]" r"\d*"
year, month, days_intervale_or_day = "", "", "" # = group()[2], group()[1], group()[0]
date_restructuring_structure = days_intervale_or_day "-" month "-" year
print(repr(date_restructuring_structure))
input_text = re.sub(detection_regex_obligatory_preposition, date_restructuring_structure, input_text)
print(repr(input_text)) # --> output
Correct outputs for each of these cases
""
"05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
""
"04 del 05 del 07 del 2000" #example 1 - Not modify!
"05-06-200"
"04 05-06-200" #example 2 - Yes modify!
"05-06-20076"
"04 05-06-20076 55" #example 3 - Yes modify!
In the example 1 should not be replaced since there is more than one day indicated in front of it, leaving something like this
\d{2} del \d{2} del \d{2} del \d*
and not this \d{2} del \d{2} del \d*
Something similar happens in example 0 where there is no need to perform the replacement since this \d{2} del \d{2} del \d* de \d{2}
or \d{2} del \d{2} del \d* de \d*
and not this \d{2} del \d{2} del \d*
How to set the capture groups and the regex to be able to perform the replacements of examples 2 and 3, but not those of examples 0 and 1?
CodePudding user response:
Demo: https://regex101.com/r/w7Yp7J/1
import re
#input_text = "05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
#input_text = "04 del 05 del 07 del 2000" #example 1 - Not modify!
input_text = "04 05 del 06 de 200" #example 2 - Yes modify!
input_text = "04 05 del 06 de 20076 55" #example 3 - Yes modify!
detection_regex_obligatory_preposition = r"(?P<startDay>\d{2})[\s|](?P<finishDay>\d{2})[\s|](?:del|de[\s|]el|de )[\s|](?P<month>\d{2})[\s|](?:del|de[\s|]el|de)[\s|](?P<year>\d*)"
date_restructuring_structure = "\g<startDay> \g<finishDay>-\g<month>-\g<year>"
input_text = re.sub(detection_regex_obligatory_preposition, date_restructuring_structure, input_text)
print(repr(input_text)) # --> output
To see your code on Regex101, I combined your rules as the following:
\d{2}[\s|](?:del|de[\s|]el|de )[\s|]\d{2}[\s|](?:del|de[\s|]el|de)[\s|]\d*
I realized that it grabs the inputs, which are the exact opposite of what we want. Like the following:
05 del 07 del 2000 del 09 hhggh #example 0 - Captured
04 del 05 del 07 del 2000 #example 1 - Captured
04 05 del 06 de 200 #example 2 - Not Captured
04 05 del 06 de 20076 55 #example 3 - Not Captured
To grab the correct inputs, I modified your rule by adding two digit number rule (\d{2}
) to the beginning:
\d{2}[\s|]\d{2}[\s|](?:del|de[\s|]el|de )[\s|]\d{2}[\s|](?:del|de[\s|]el|de)[\s|]\d*
Now, it grabs the correct inputs, and we can turn our faces to replacement rules. There are two kinds of replacement rules. The first one is the number format (Like: \1 \2-\3-\4
in our case), which is the default behavior. When you wrap something with parenthesis, it is in number format. The second is name format (Like: \g<startDay> \g<finishDay>-\g{month}-\g{year}
in our case), which I prefer. To make name-format replacements, you need to use named capturing groups (?P<startDay>***)
.
Let's add named capturing groups to our rule:
(?P<startDay>\d{2})[\s|](?P<finishDay>\d{2})[\s|](?:del|de[\s|]el|de )[\s|](?P<month>\d{2})[\s|](?:del|de[\s|]el|de)[\s|](?P<year>\d*)
The final code:
import re
#input_text = "05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
#input_text = "04 del 05 del 07 del 2000" #example 1 - Not modify!
input_text = "04 05 del 06 de 200" #example 2 - Yes modify!
input_text = "04 05 del 06 de 20076 55" #example 3 - Yes modify!
detection_regex_obligatory_preposition = r"(?P<startDay>\d{2})[\s|](?P<finishDay>\d{2})[\s|](?:del|de[\s|]el|de )[\s|](?P<month>\d{2})[\s|](?:del|de[\s|]el|de)[\s|](?P<year>\d*)"
date_restructuring_structure = "\g<startDay> \g<finishDay>-\g<month>-\g<year>"
input_text = re.sub(detection_regex_obligatory_preposition, date_restructuring_structure, input_text)
print(repr(input_text)) # --> output