import re, datetime
input_text = "hhhh ((44_-_44)) ggj ((2022_-_02_-_18 20:00 pm)) ((((2022_-_02_-_18 20:00 pm))) (2022_-_02_-_18 00:00 am)"
identify_dates_regex_00 = r"(?P<year>\d*)_-_(?P<month>\d{2})_-_(?P<startDay>\d{2})"
identify_time_regex = r"(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>(?:am|pm))"
restructuring_structure_00 = "(" r"\g<year>_-_\g<month>_-_\g<startDay>" r" \g<hh>:\g<mm> \g<am_or_pm>" ")"
input_text = re.sub("\(" identify_dates_regex_00 " " identify_time_regex "\)", restructuring_structure_00, input_text)
print(repr(input_text)) # --> output
This is the wrong output that I get:
'hhhh ((44_-_44)) ggj ((2022_-_02_-_18 20:00 pm)) ((((2022_-_02_-_18 20:00 pm))) (2022_-_02_-_18 00:00 am)'
This is the correct output, without the extra parentheses, that I get:
'hhhh ((44_-_44)) ggj (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 00:00 am)'
I need it to remove the unnecessary parentheses if they have in the middle the structure of year_-_month_-_day hour:minute am or pm
, that in regex using capture groups can be written like this "(?P<year>\d*)_-_(?P<month>\d{2})_-_(?P<startDay>\d{2})" identify_time_regex = r"(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>(?:am|pm))"
or with and without determining capturing groups, it could be written with simple regex (although we would lose the possibility of capturing the data) "\d*_-_\d{2}_-_\d{2} \d{2}:\d{2}[\s|]*[ap]m"
CodePudding user response:
You can use a single capture group to capture the date and time format between parenthesis, and then remove any surrounding parenthesis.
To do the replacement, you don't need the named capture groups.
In the replacement use capture group 1.
\(*(\(\d{4}_-_\d{2}_-_\d{2} \d{2}:\d{2}[\s|]*[ap]m\))\)*
Example code:
import re
input_text = "hhhh ((44_-_44)) ggj ((2022_-_02_-_18 20:00 pm)) ((((2022_-_02_-_18 20:00 pm))) (2022_-_02_-_18 00:00 am)"
pattern = r"\(*(\(\d{4}_-_\d{2}_-_\d{2} \d{2}:\d{2}[\s|]*[ap]m\))\)*"
print(re.sub(pattern, r"\1", input_text))
Output
hhhh ((44_-_44)) ggj (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 00:00 am)