import re
input_text = "entre las 15 : hs -- 16:10 " #Example 1
input_text = "entre las 21 : -- 22" #Example 2
input_text = "entre la 1 30 -- 2 " #Example 3
input_text = "entre la 1 09 h.s. -- 6 : hs." #Example 4
input_text = "entre la 1:50 -- 6 :" #Example 5
input_text = "entre la 7 59 -- 23 : " #Example 6
input_text = "entre la 10: -- : 10" #Example 7
print(repr(input_text)) #print the output string
And this function fix_time_patterns_in_time_intervals()
should be something like this, although you may have to use exceptions for possible index errors.
The function should only do the replacements if the hours (the first group) are less than 23, since there is no such thing as a 25th hour in a day. And in the case of minutes (the second group) the function should only make the replacements if the minutes are less than 59, since an hour cannot have more than 60 minutes and the 60th minute is already considered 0 and part of the next hour. Due to this limitation, the replacements should only be made under the conditions that the following conditionals pose within the function, otherwise it would only replace the same substring that was extracted from the original string.
def fix_time_patterns_in_time_intervals(match_num_time):
hour_exist = False
if(int(match_num_time[1]) <= 23):
#do the replacement process
if(len(match_num_time[1]) == 1): match_num_time[1] = "0" str(match_num_time[1])
elif(len(match_num_time[1]) == 0): match_num_time[1] = "00"
hour_exist = True
if(int(match_num_time[2]) <= 59):
#do the replacement process
if(len(match_num_time[2]) == 1): match_num_time[2] = "0" str(match_num_time[2])
elif(len(match_num_time[2]) == 0): match_num_time[2] = "00"
elif( (int(match_num_time[2]) == None) and (hour_exist == True) ):
#do the replacement process
match_num_time[2] = "00"
return match_num_time #the extracted substring
I think I could use regex capturing group match with re.group()
or re.groups()
method, and extract the first time mentioned the hours in the input string and then extract the other hour that appears in this string.
At the end you should be able to print the original string and object these results(output) in each of the examples respectively :
"entre las 15:00 hs -- 16:10 " #Example 1
"entre las 21:00 -- 22:00" #Example 2
"entre la 01:30 -- 02:00 " #Example 3
"entre la 01:09 h.s. -- 06:00 hs." #Example 4
"entre la 01:50 -- 06:00" #Example 5
"entre la 07:59 -- 23:00" #Example 6
"entre la 10:00 -- 00:10" #Example 7
some additional examples of what time (hours:minutes) conversions should look like:
"6 :" ---> "06:00"
"6:" ---> "06:00"
"6" ---> "06:00"
": 6" ---> "00:06"
":6" ---> "00:06"
": 16" ---> "00:16"
":16" ---> "00:16"
" 6" ---> "06:00"
"15 : 1" ---> "15:01"
"15 1" ---> "15:01"
": 15" ---> "00:15"
"0 15" ---> "00:15"
I am having problems when extracting values to evaluate within the function fix_time_patterns_in_time_intervals()
after identifying them with the regex, I hope you can help me with this.
CodePudding user response:
You can use this regex to match your time values:
(?=[:\d])(?P<hour>\d )? *:? *(?P<minute>\d )?(?<! )
This matches:
(?=[:\d])
: assert the string starts with a digit or a:
- this ensures that we always start by matching the hour group if it is present(?P<hour>\d )?
: optional digits captured in thehour
group*:? *
: an optional:
surrounded by optional spaces(?P<minute>\d )?
: optional digits captured in the minutes group(?<! )
: assert the string doesn't end in a space so we don't chew up spaces used for formatting
Regex demo on regex101
You can then use this replacement function to check for the existence of the match groups and (if the values are valid) reformat them with leading 0's as required:
def fix_time_patterns_in_time_intervals(match_num_time):
hour = int(match_num_time.group('hour') or '0')
minute = int(match_num_time.group('minute') or '0')
if hour > 23 or minute > 59:
# invalid, don't convert
return match_num_time.group(0)
return f'{hour:02d}:{minute:02d}'
For your sample data (with a couple of invalid values):
times = [
"entre las 15 : hs -- 16:10 ",
"entre las 21 : -- 22",
"entre la 1 30 -- 2 ",
"entre la 1 09 h.s. -- 6 : hs.",
"entre la 25 0 -- 12:0",
"entre las 13 64 -- 5",
"entre la 1:50 -- 6 :",
"entre la 7 59 -- 23 : ",
"entre la 10: -- : 10"
]
regex = re.compile(r'(?=[:\d])(?P<hour>\d )? *:? *(?P<minute>\d )?(?<! )')
for time in times:
print(regex.sub(fix_time_patterns_in_time_intervals, time))
Output:
entre las 15:00 hs -- 16:10
entre las 21:00 -- 22:00
entre la 01:30 -- 02:00
entre la 01:09 h.s. -- 06:00 hs.
entre la 25 0 -- 12:00
entre las 13 64 -- 05:00
entre la 01:50 -- 06:00
entre la 07:59 -- 23:00
entre la 10:00 -- 00:10