Home > Mobile >  How to reorder data from a character string with re.sub only in cases where it detects a certain reg
How to reorder data from a character string with re.sub only in cases where it detects a certain reg

Time:11-20

import re

#example
input_text = 'Alrededor de las 00:16 am o las 23:30 pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)'


identify_time_regex = r"(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>(?:am|pm))"

restructuring_structure_00 = r"(\g<hh>----\g<mm>----\g<am_or_pm>)"

#replacement
input_text = re.sub(identify_regex_01_a, restructuring_structure_00, input_text)


print(repr(input_text)) # --> output

I have to change things in this regex identify_time_regex so that it extracts the hour numbers but only if it is inside a structure like the following (2022_-_02_-_18 00:16 am), which can be generalized as follows:

r"(\d*_-_\d{2}_-_\d{2}) " identify_time_regex

The output that I need, you can see that only those hours were modified where there was no date before:

input_text = 'Alrededor de las 00----16----am o las 23----30----pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)'

CodePudding user response:

You can use

import re

input_text = 'Alrededor de las 00:16 am o las 23:30 pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)'
identify_time_regex = r"(\b\d{4}_-_\d{2}_-_\d{2}\s )?(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>[ap]m)"
restructuring_structure_00 = lambda x: x.group() if x.group(1) else fr"{x.group('hh')}----{x.group('mm')}----{x.group('am_or_pm')}"
input_text = re.sub(identify_time_regex, restructuring_structure_00, input_text)
print(input_text)
# Alrededor de las 00----16----am o las 23----30----pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)

See the Python demo.

The logic is the following: if the (\b\d{4}_-_\d{2}_-_\d{2}\s )? optional capturing group matches, the replacement is the whole match (i.e. no replacement occurs), and if it does not, your replacement takes place.

The restructuring_structure_00 must be a lambda expression since the match structure needs to be evaluated before replacement.

The \b\d{4}_-_\d{2}_-_\d{2}\s pattern matches a word boundary, four digits, _-_, two digits, _-_, two digits, and one or more whitespaces.

  • Related