Home > Enterprise >  Catch the following capture groups with a regex and then reorder them with re sub method if the patt
Catch the following capture groups with a regex and then reorder them with re sub method if the patt

Time:10-23

import re

input_text = "05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
input_text = "04 del 05 del 07 del 2000" #example 1 - Not modify!
input_text = "04 05 del 06 de 200" #example 2 - Yes modify!
input_text = "04 05 del 06 de 20076 55" #example 3 - Yes modify!

detection_regex_obligatory_preposition = r"\d{2}"   r"[\s|](?:del|de[\s|]el|de )[\s|]"   r"\d{2}"   r"[\s|](?:del|de[\s|]el|de )[\s|]"   r"\d*"

year, month, days_intervale_or_day = "", "", "" # = group()[2], group()[1], group()[0]
date_restructuring_structure = days_intervale_or_day   "-"   month   "-"   year
print(repr(date_restructuring_structure))

input_text = re.sub(detection_regex_obligatory_preposition, date_restructuring_structure, input_text)

print(repr(input_text)) # --> output

Correct outputs for each of these cases

""
"05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!

""
"04 del 05 del 07 del 2000" #example 1 - Not modify!

"05-06-200"
"04 05-06-200" #example 2 - Yes modify!

"05-06-20076"
"04 05-06-20076 55" #example 3 - Yes modify!

In the example 1 should not be replaced since there is more than one day indicated in front of it, leaving something like this \d{2} del \d{2} del \d{2} del \d* and not this \d{2} del \d{2} del \d*

Something similar happens in example 0 where there is no need to perform the replacement since this \d{2} del \d{2} del \d* de \d{2} or \d{2} del \d{2} del \d* de \d* and not this \d{2} del \d{2} del \d*

How to set the capture groups and the regex to be able to perform the replacements of examples 2 and 3, but not those of examples 0 and 1?

CodePudding user response:

Demo: https://regex101.com/r/w7Yp7J/1

import re

#input_text = "05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
#input_text = "04 del 05 del 07 del 2000" #example 1 - Not modify!
input_text = "04 05 del 06 de 200" #example 2 - Yes modify!
input_text = "04 05 del 06 de 20076 55" #example 3 - Yes modify!

detection_regex_obligatory_preposition = r"(?P<startDay>\d{2})[\s|](?P<finishDay>\d{2})[\s|](?:del|de[\s|]el|de )[\s|](?P<month>\d{2})[\s|](?:del|de[\s|]el|de)[\s|](?P<year>\d*)"

date_restructuring_structure = "\g<startDay> \g<finishDay>-\g<month>-\g<year>"

input_text = re.sub(detection_regex_obligatory_preposition, date_restructuring_structure, input_text)

print(repr(input_text)) # --> output

To see your code on Regex101, I combined your rules as the following:

\d{2}[\s|](?:del|de[\s|]el|de )[\s|]\d{2}[\s|](?:del|de[\s|]el|de)[\s|]\d*

I realized that it grabs the inputs, which are the exact opposite of what we want. Like the following:

05 del 07 del 2000 del 09 hhggh #example 0 - Captured
04 del 05 del 07 del 2000 #example 1 - Captured
04 05 del 06 de 200 #example 2 - Not Captured
04 05 del 06 de 20076 55 #example 3 - Not Captured

To grab the correct inputs, I modified your rule by adding two digit number rule (\d{2}) to the beginning:

\d{2}[\s|]\d{2}[\s|](?:del|de[\s|]el|de )[\s|]\d{2}[\s|](?:del|de[\s|]el|de)[\s|]\d*

Now, it grabs the correct inputs, and we can turn our faces to replacement rules. There are two kinds of replacement rules. The first one is the number format (Like: \1 \2-\3-\4 in our case), which is the default behavior. When you wrap something with parenthesis, it is in number format. The second is name format (Like: \g<startDay> \g<finishDay>-\g{month}-\g{year} in our case), which I prefer. To make name-format replacements, you need to use named capturing groups (?P<startDay>***).

Let's add named capturing groups to our rule:

(?P<startDay>\d{2})[\s|](?P<finishDay>\d{2})[\s|](?:del|de[\s|]el|de )[\s|](?P<month>\d{2})[\s|](?:del|de[\s|]el|de)[\s|](?P<year>\d*)

The final code:

import re

#input_text = "05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
#input_text = "04 del 05 del 07 del 2000" #example 1 - Not modify!
input_text = "04 05 del 06 de 200" #example 2 - Yes modify!
input_text = "04 05 del 06 de 20076 55" #example 3 - Yes modify!

detection_regex_obligatory_preposition = r"(?P<startDay>\d{2})[\s|](?P<finishDay>\d{2})[\s|](?:del|de[\s|]el|de )[\s|](?P<month>\d{2})[\s|](?:del|de[\s|]el|de)[\s|](?P<year>\d*)"

date_restructuring_structure = "\g<startDay> \g<finishDay>-\g<month>-\g<year>"

input_text = re.sub(detection_regex_obligatory_preposition, date_restructuring_structure, input_text)

print(repr(input_text)) # --> output
  • Related