Create patterns to detect the occurrence of sequences so that you can restrict in which cases to rep-CodePudding

These are the input string examples:

#example 1.1
colloquial_hour = "Hola nos vemos a las diez y veinte a m, ten en cuenta que al amanecer tendremos que estar despiertos, porque debemos estar alli a eso de nueve a m o las diez y cuarto a m"
#example 1.2
colloquial_hour = "A mi me parece entre las 10 15 am y las 11 a m, o a las 15 a m aunque quizas a medianoche este bien a eso de las 00:00 a m"
#example 1.3
colloquial_hour = "Puede que a las 10 am. Hay 10 a medias, a m mmm... creo que en 10 estarian para terminar a las 11:00 hs a m 11:59 a m"
#example 1.4
colloquial_hour = "Amediados a mediados del 30 antes de y dia; me parace que hay que estar en casa. Medianamente a, mediados de las 05 a m o cerca de 6 a m."

I have tried with a simple replacement, but I think that the cases must be further restricted with a regex pattern so that unwanted replacements are not made...

colloquial_hour = colloquial_hour.replace('a m', 'am ')

, and to be able to obtain this string as output...

the correct output for each of these examples:

#example 1.1
colloquial_hour = "Hola nos vemos a las diez y veinte am, ten en cuenta que al amanecer tendremos que estar despiertos, porque debemos estar alli a eso de nueve am o las diez y cuarto am"
#example 1.2
colloquial_hour = "A mi me parece entre las 10 15 am y las 11 am, o a las 15 am aunque quizas a medianoche este bien a eso de las 00:00 am"
#example 1.3
colloquial_hour = "Puede que a las 10 am. Hay 10 a medias, a m mmm... creo que en 10 estarian para terminar a las 11:00 hs am 11:59 am"
#example 1.4
colloquial_hour = "Amediados a mediados del 30 antes de y dia; me parace que hay que estar en casa. Medianamente a, mediados de las 05 am o cerca de 6 am."

In this case, the pseudo-pattern is: some number "a m" to replace with the string "am" one or more empty spaces, a period, a comma or directly the end of the string

Cases should also be considered where there may be incompletely written schedules where "am" would be preceded by ":", " :", ": ", " hs", "hs", "hs ", " h.s. ", "h.s.", "h.s. ", " h.s", "h.s" or "h.s " , for example,

input_t = "a las 12: a m"
output = "a las 12: am"

input_t = "a las 12 : a m"
output = "a las 12 : am"

input_t = "a las 12 hs a m"
output = "a las 12 hs am"

input_t = "a las 12:hs a m"
output = "a las 12:hs am"

input_t = "a las 12: hs a m"
output = "a las 12: hs am"

input_t = "a las 12hsa m"
output = "a las 12hs am"

input_t = "a las 12h.sa m"
output = "a las 12h.s am"

input_t = "a las 12 h.sa m"
output = "a las 12 h.s am"

input_t = "a las 12 h.s.a m"
output = "a las 12 h.s. am"

CodePudding user response：

For the first part I made this regex:

out = re.sub(r"([0-9][0-9]\W)a m(\W|\b)", r"\1am\2", colloquial_hour)

It change the "a m" for "am" keeping whatever was before and after.

For the "hs" or "h.s" I did this:

out = re.sub(r"(hs|h.s|h.s.)a m(\W|\b)", r"\1 am\2", colloquial_hour)

It search for "hs", "h.s" or "h.s." before "a m". You can combine both regex, they are pretty similar or use them sequentially.

For the last part involving hours and

And in this case, the pseudo-pattern is: the start of the string or one of these options (?:las|la |antes de |despues de ) the word string to replace

I don't think I know what you want to do. If you provide a little more info I can check it.