Home > Mobile >  How to remove extra parentheses, if and only if, in between they contain a regex pattern?
How to remove extra parentheses, if and only if, in between they contain a regex pattern?

Time:11-23

import re, datetime

input_text = "hhhh ((44_-_44)) ggj ((2022_-_02_-_18 20:00 pm)) ((((2022_-_02_-_18 20:00 pm))) (2022_-_02_-_18 00:00 am)"

identify_dates_regex_00 = r"(?P<year>\d*)_-_(?P<month>\d{2})_-_(?P<startDay>\d{2})"
identify_time_regex = r"(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>(?:am|pm))"

restructuring_structure_00 = "("   r"\g<year>_-_\g<month>_-_\g<startDay>"   r" \g<hh>:\g<mm> \g<am_or_pm>"   ")"

input_text = re.sub("\("   identify_dates_regex_00   " "   identify_time_regex   "\)", restructuring_structure_00, input_text)

print(repr(input_text)) # --> output

This is the wrong output that I get:

'hhhh ((44_-_44)) ggj ((2022_-_02_-_18 20:00 pm)) ((((2022_-_02_-_18 20:00 pm))) (2022_-_02_-_18 00:00 am)'

This is the correct output, without the extra parentheses, that I get:

'hhhh ((44_-_44)) ggj (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 00:00 am)'

I need it to remove the unnecessary parentheses if they have in the middle the structure of year_-_month_-_day hour:minute am or pm, that in regex using capture groups can be written like this "(?P<year>\d*)_-_(?P<month>\d{2})_-_(?P<startDay>\d{2})" identify_time_regex = r"(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>(?:am|pm))" or with and without determining capturing groups, it could be written with simple regex (although we would lose the possibility of capturing the data) "\d*_-_\d{2}_-_\d{2} \d{2}:\d{2}[\s|]*[ap]m"

CodePudding user response:

You can use a single capture group to capture the date and time format between parenthesis, and then remove any surrounding parenthesis.

To do the replacement, you don't need the named capture groups.

In the replacement use capture group 1.

\(*(\(\d{4}_-_\d{2}_-_\d{2} \d{2}:\d{2}[\s|]*[ap]m\))\)*

Regex demo

Example code:

import re

input_text = "hhhh ((44_-_44)) ggj ((2022_-_02_-_18 20:00 pm)) ((((2022_-_02_-_18 20:00 pm))) (2022_-_02_-_18 00:00 am)"
pattern = r"\(*(\(\d{4}_-_\d{2}_-_\d{2} \d{2}:\d{2}[\s|]*[ap]m\))\)*"
print(re.sub(pattern, r"\1", input_text))

Output

hhhh ((44_-_44)) ggj (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 00:00 am)
  • Related