import re
#example
input_text = 'Alrededor de las 00:16 am o las 23:30 pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)'
identify_time_regex = r"(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>(?:am|pm))"
restructuring_structure_00 = r"(\g<hh>----\g<mm>----\g<am_or_pm>)"
#replacement
input_text = re.sub(identify_regex_01_a, restructuring_structure_00, input_text)
print(repr(input_text)) # --> output
I have to change things in this regex identify_time_regex
so that it extracts the hour numbers but only if it is inside a structure like the following (2022_-_02_-_18 00:16 am)
, which can be generalized as follows:
r"(\d*_-_\d{2}_-_\d{2}) " identify_time_regex
The output that I need, you can see that only those hours were modified where there was no date before:
input_text = 'Alrededor de las 00----16----am o las 23----30----pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)'
CodePudding user response:
You can use
import re
input_text = 'Alrededor de las 00:16 am o las 23:30 pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)'
identify_time_regex = r"(\b\d{4}_-_\d{2}_-_\d{2}\s )?(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>[ap]m)"
restructuring_structure_00 = lambda x: x.group() if x.group(1) else fr"{x.group('hh')}----{x.group('mm')}----{x.group('am_or_pm')}"
input_text = re.sub(identify_time_regex, restructuring_structure_00, input_text)
print(input_text)
# Alrededor de las 00----16----am o las 23----30----pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)
See the Python demo.
The logic is the following: if the (\b\d{4}_-_\d{2}_-_\d{2}\s )?
optional capturing group matches, the replacement is the whole match (i.e. no replacement occurs), and if it does not, your replacement takes place.
The restructuring_structure_00
must be a lambda expression since the match structure needs to be evaluated before replacement.
The \b\d{4}_-_\d{2}_-_\d{2}\s
pattern matches a word boundary, four digits, _-_
, two digits, _-_
, two digits, and one or more whitespaces.