How to fix this regex so that with these input strings I get these outputs...
out = re.sub(r"(hs|h.s|h.s.)a m(\W|\b)", r"\1 am\2", out)
print(repr(out))
Input string examples...
#example 1.1
colloquial_hour = "Cerca de las 2: hs a m, hay que salir antes de esas hs a m"
#example 1.2
colloquial_hour = "A medida que avance cerca de la media noche 12: 04 hs a m. Deben ir a las 15 hs a m."
#example 1.3
colloquial_hour = "A mmm... cerca de las 12: h.s a m, hay que salir antes de esas h.s. a m"
#example 1.4
colloquial_hour = "A medida que avance cerca de las 12:04 hs. a m. Deben ir a las 15 h.s a m."
correct outputs:
#correct output for example 1.1
"Cerca de las 2: hs am, hay que salir antes de esas hs a m"
#correct output for example 1.2
"A medida que avance cerca de la media noche 12: 04 hs am. Deben ir a las 15 hs am."
#correct output for example 1.3
"A mmm... cerca de las 12: h.s am, hay que salir antes de esas h.s. a m"
#correct output for example 1.4
"A medida que avance cerca de las 12:04 hs. am. Deben ir a las 15 h.s am."
The logic should work that su will do a numeric value and then an "a m"
replace that "a m"
substring with this string "am"
in the original string.
These would be all the possible cases where you have to replace the substring "a m" with "am"
X a m
X: a m
X: hs a m
X: h.s. a m
X: h.s a m
X: hs. a m
X: a m
X : hs a m
X : h.s. a m
X : h.s a m
X : hs. a m
X hs a m
X h.s. a m
X h.s a m
X hs. a m
#where "X" is a numerical value ("1", "2", "3", "4", "5", "6", ... )
#in all these cases, in which this pattern is met, "a m" must be replaced by "am"
CodePudding user response:
You can search using regex:
(\d\W )(h\.?s\.?\s )?a\s m\b
and replace using:
\1\2am
RegEx Details:
(\d\W )
: Match a digit followed by 1 non-word char in capture group #1(h\.?s\.?\s )?
: Matchh
followed bys
with optional dots after them. This optional group is capture group #2a\s m\b
: Matcha
followed by 1 whitespaces thenm
with a word boundary
CodePudding user response:
My solution uses re.sub
import re
phrases = ["Cerca de las 2: hs a m, hay que salir antes de esas hs a m",
"A medida que avance cerca de la media noche 12: 04 hs a m. Deben ir a las 15 hs a m.",
"A mmm... cerca de las 12: h.s a m, hay que salir antes de esas h.s. a m",
"A medida que avance cerca de las 12:04 hs. a m. Deben ir a las 15 h.s a m."]
pattern = re.compile(r'\d\s*?:?\s*?h?\.?s?\.?\s(a m)')
for phrase in phrases:
print(pattern.sub(lambda x: x.group(0)[:-3] "am", phrase))
OUTPUT
Cerca de las 2: hs am, hay que salir antes de esas hs a m
A medida que avance cerca de la media noche 12: 04 hs am. Deben ir a las 15 hs am.
A mmm... cerca de las 12: h.s am, hay que salir antes de esas h.s. a m
A medida que avance cerca de las 12:04 hs. am. Deben ir a las 15 h.s am.
CodePudding user response:
You could match:
(\d \s*:?\s*(?:h\.?s\.?)?)\s*a m\b
The pattern matches:
(
Capture group 1\d \s*:?\s*
match 1 digits and an optional:
between optional whitespace chars(?:h\.?s\.?)?
Optionally matchhm
h.s
hs.
h.s.
)
Close group 1\s*a m\b
Match optional whitespace chars anda m
And replace with group 1 followed by am
\1 am
See a regex demo and a Python demo