Home > other >  Detect a pattern within a string, extract all the substrings in which that pattern occurs and then m
Detect a pattern within a string, extract all the substrings in which that pattern occurs and then m

Time:08-26

import re

input_text = "03:00 am hay 1  Entre la 1:30 y las 2:0, o 01:02 am  minuto y salimos, 19:30 pm salimos!! es importante llegar alla antes 20 :30 am, ya que 21:00 pm cierran algunos negocios, sin embargo el cine esta abierto hasta 23:30 pm o 01 : 00 am, 1:00 am 1:00 pm, : p m, 1: pm  5: pm"

This is the regexp prototype to detect a pattern that encompasses the following substrings

civil_time_pattern = r'(\d{1,2})[\s|:]*(\d{0,2})\s*(am|pm)?'
civil_time_unit_list = list(map(re.findall(civil_time_pattern, input_text)))

the substrings it must be able to detect in the original input string: ["03:00 am", "1:30", "2:0", "01:02 am", "19:30 pm", "20 :30 am", "21:00 pm", "23:30 pm", "01 : 00 am", "1:00 am", "1:00 pm", ": p m", "1: pm", "5: pm"]

This is the conversion process that must have each one of the substrings ( hh:mm am or pm ) that it detects within the input_string. One of the problems with this code is how to apply these replacements only in cases where the previous regex is true.

#Block of code that should receive the substring, chunk it and try to correct it, then later replace the corrected version in the original string
    if (If the pattern is met...  ):
        try:
            hh = civil_time_unit_list[0][0]
            if (hh == ""): hh = "00"
        except IndexError: hh = "00"

        try:
            mm = civil_time_unit_list[0][1]
            if (mm == ""): mm = "00"
        except IndexError: mm = "00"

        try:
            am_pm = civil_time_unit_list[0][2]
            if (am_pm == ""):
                if (int(hh) >= 0 and int(hh) < 12): am_pm = "am"
                elif (int(hh) >= 12 and int(hh) < 24): am_pm = "pm"
            else:
                #If it says pm, the indication pm will be prioritized over the hour that is indicated
                #But if it says am the time will be prioritized over the indication of am
                if (am_pm == "am"):
                    if (int(hh) >= 12 and int(hh) < 24): am_pm = "pm"
                    else: pass
                elif (am_pm == "pm"):
                    if (int(hh) >= 0 and int(hh) < 12): hh = str( int(hh)   12 )
                    else: pass
        except IndexError:
            if (int(hh) >= 0 and int(hh) < 12): am_pm = "am"
            elif (int(hh) >= 12 and int(hh) < 24): am_pm = "pm"

        #Add "0" in front, if the substring is not 2 characters long
        if (len(hh) < 2): hh = "0"   hh
        if (len(mm) < 2): mm = "0"   mm

        output = hh   ":"   mm   " "   am_pm
        output = output.strip()

One of the possible problems is that we do not know how many times that pattern will appear, so I do not know how many times it would have to be extracted and therefore I do not know how many substrings I will have to send to the correction and replacement process, and I also have to consider that the same replacement can occur 2 times (or more).

print(repr(input_text)) #You should now be able to print the original string but with all the replacements already done.

And this is the correct output that I need, as you can see the previous process has been applied on each of the patterns hh:mm am or pm

input_text = "03:00 am hay 1  Entre la 01:30 am y las 02:00 am, o 01:02 am  minuto y salimos, 19:30 pm salimos!! es importante llegar alla antes 20:30 pm, ya que 21:00 pm cierran algunos negocios, sin embargo el cine esta abierto hasta 23:30 pm o 01:00 am, 01:00 am 13:00 pm, 00:00 am, 13:00 pm  05:00 pm"

CodePudding user response:

IIUC this is what you want, replace all matched strings by that matched string converted to some other string, you can easily just do it with re.sub by giving it the function that will handle the conversion using the matched group and return it back to be used as the replacement:


input_text = "03:00 am hay 1  Entre la 1:30 y las 2:0, o 01:02 am  minuto y salimos, 19:30 pm salimos!! es importante llegar alla antes 20 :30 am, ya que 21:00 pm cierran algunos negocios, sin embargo el cine esta abierto hasta 23:30 pm o 01 : 00 am, 1:00 am 1:00 pm, : p m, 1: pm  5: pm"
civil_time_pattern = re.compile(r"(\d{1,2})[\s|:]*(\d{0,2})\s*(am|pm)?")


def convert(match):
    hh = match.group(1) or "00"
    mm = match.group(2) or "00"

    am_pm = match.group(3)
    if not am_pm:
        if 0 <= int(hh) < 12:
            am_pm = "am"
        elif 12 <= int(hh) < 24:
            am_pm = "pm"
    # If it says pm, the indication pm will be prioritized over the hour that is indicated
    # But if it says am the time will be prioritized over the indication of am
    if am_pm == "am":
        if 12 <= int(hh) < 24:
            am_pm = "pm"
    elif am_pm == "pm":
        if 0 <= int(hh) < 12:
            hh = str(int(hh)   12)

    # Add "0" in front, if the substring is not 2 characters long
    hh = hh.zfill(2)
    mm = mm.zfill(2)

    output = f"{hh}:{mm}"
    return output


result = civil_time_pattern.sub(convert, input_text)
print(result)
  • Related