Home > Blockchain >  RegEx that extracts content from a string associating it with a pattern 'XX:XX am or pm'
RegEx that extracts content from a string associating it with a pattern 'XX:XX am or pm'

Time:08-02

I'm having trouble creating a regex that extracts the phrase associated with a time XX:XX am or pm

import re

hh, mm, am_pm = "", "", "" #each hour group element in str format
times_output = [] #list that must accumulate all time in "XX:XX am or pm" format, 'X' is a int value

pseudo-regex-patterns (for this type of example input string)

"sense" \s* entre las \s* "XX:XX am or pm" \s* y las \s* "XX:XX am or pm"

"XX:XX am or pm" ---> "sense"
"XX:XX am or pm" ---> "sense"
"sense" a las "XX:XX am or pm"

"XX:XX am or pm" ---> "sense"
"sense" a las "XX:XX am or pm", a las "XX:XX am or pm" o a las "XX:XX am or pm"

"XX:XX am or pm" ---> "sense"
"XX:XX am or pm" ---> "sense"
"XX:XX am or pm" ---> "sense"
(...|.|,|;) \s* "sense1" \s* (a las|de las|) \s* "XX:XX am or pm" \s* "sense2"

"XX:XX am or pm" ---> "sense1"   "sense2"
"22:00 pm"       ---> "ya que a las"   "empieza el show"

In this case "ya que" and "a las" will be removed

Regex pattern to extract times from the input sentence no matter what is before or after the times pattern

Example 1:

input_text = "puede ser peligroso salir entre las 18:00 pm y las 20:00 pm hs, por ello yo pienso que seria mejor salir a las 21:00 pm, a las 21:15 pm o a las 21:30 pm ya que a las 22:00 pm empezaria el show"

#sense_pattern = r"(?P()\s. ?)" #THE REGEX THAT I NEED
civil_time_pattern = r'(\d{1,2})[\s|:]*(\d{0,2})\s*(am|pm)?'

#civil_time_unit_list = re.search(civil_time_pattern, input_text_all_in_minus)
civil_time_unit_list = re.findall(civil_time_pattern, input_text_all_in_minus)

Validation hours, mins, and day time (am or pm), in this case this is more important only for the time regex

try:
    hh = civil_time_unit_list[0][0]
    if (hh == ""): hh = "00"
except IndexError: hh = "00"
try:
    mm = civil_time_unit_list[0][1]
    if (mm == ""): mm = "00"
except IndexError: mm = "00"
try:
    am_pm = civil_time_unit_list[0][2]
    if (am_pm == ""): am_pm = "am"
except IndexError: am_pm = "am"

time_output = (hh   ":"   mm   " "   am_pm).strip()
#remove unnecessary connectors in the <<sense>>
sense = sense.replace("entre las", "")
sense = sense.replace("y las", "")
sense = sense.replace("entre las", "")
sense = sense.replace("a las", "")
sense = sense.replace("ya que", "")

Then simply create the files with the name of the schedule and inside them write the associated sense

time_output_file = time_output   ".txt"
with open(time_output_file, 'w') as f:
    f.write(sense)

In the end the files should look like this (for this example)...

18:00 pm.txt ----> 'puede ser peligroso salir'
20:00 pm.txt ----> 'puede ser peligroso salir'
21:00 pm.txt ----> 'por ello yo pienso que seria mejor salir'
21:15 pm.txt ----> 'por ello yo pienso que seria mejor salir'
21:30 pm.txt ----> 'por ello yo pienso que seria mejor salir'
22:00 pm.txt ----> 'empezaria el show'

CodePudding user response:

This is a tricky little problem. Once you get rid of the connecting phrases, then it's just a matter of "splitting" the string on times. This seems to do mostly what you want, although it won't handle every strange input variation. In particular, the sense always comes before the time, so there doesn't seem to be any reason why "empezaria el show" should be used for the final time.

import re
sense = "puede ser peligroso salir entre las 18:00 pm y las 20:00 pm hs, por ello yo pienso que seria mejor salir a las 21:00 pm, a las 21:15 pm o a las 21:30 pm ya que a las 22:00 pm empezaria el show"

sense = sense.replace("entre las", "") \
        .replace("y las", "") \
        .replace("o a las","") \
        .replace("a las", "") \
        .replace(",","")

civil_time_pattern = r'(\d{1,2}:\d{1,2}\s\s*(?:am|pm))'
for m in re.split(civil_time_pattern, sense):
    m = m.strip()
    if m:
        if m[0].isdigit():
            print(f"Write '{content}' to {m}.txt")
        else:
            content = m

Output:

Write 'puede ser peligroso salir' to 18:00 pm.txt
Write 'puede ser peligroso salir' to 20:00 pm.txt
Write 'hs por ello yo pienso que seria mejor salir' to 21:00 pm.txt
Write 'hs por ello yo pienso que seria mejor salir' to 21:15 pm.txt
Write 'hs por ello yo pienso que seria mejor salir' to 21:30 pm.txt
Write 'ya que' to 22:00 pm.txt
  • Related