I have this snippet regex part used in python:
(?<!\d)(\d??\d )?(hours|hour|hrs|hr|h)?( and |, )??(?<!\d)(\d??\d )?(minutes|minute|mins|min|m)?( and |, )?(?<!\d)(\d??\d )?(seconds|second|secs|sec|s)?
and I used regex101 to check my regex. It does not work how I wanted it to be. For some reason, when I press Enter to go to a new line, it has a match.
For example, I have 10 blank lines and regex101 found 10 matches, all matched at the beginning of each of the blank lines. Also, if I spammed space on a blank line, it matches every single blank space. The match information on the right side of the webpage didn't help because it only showed "null".
I have tried out both \s
and literal space as shown in the regex code above, both has same results, 10 blank lines, 10 matches found. Couldn't think of why it matches every single blank space and a solution for it.
Full piece of regex:
(([0][0-9]|[1][0-9]|[2][0-3]) ?:([0][0-9]|[1][0-9]|[2][0-9]|[3][0-9]|[4][0-9]|[5][0-9]) ?:([0][0-9]|[1][0-9]|[2][0-9]|[3][0-9]|[4][0-9]|[5][0-9]) ?)|((?<!\d|\w)(\d?\d\s) ?((hours)|(hour)|(hrs)|(hr)|(h)))?((\sand\s)|(,\s?))?((?<!\d|\w)(\d?\d\s) ?((minutes)|(minute)|(mins)|(min)|(m)))?((\sand\s)|(,\s?))?((?<!\d|\w)(\d?\d\s) ?((seconds)|(second)|(secs)|(sec)|(s)))?
If I typed "a", regex101 matched "" before the "a" and if I typed "5 hours", regex101 matched "5 hours" and "" after the "5 hours". In VS Code, if the user did not type anything, it fails to raise ValueError which then re-prompts the user for the time, same thing happens if the user types a space or random stuff like "asdfasdf". I have tried to catch it with:
if test.group() == "":
raise ValueError
It still fails to raise ValueError
My desired results for the snippet part includes (only 12 shown, a ton of combinations can be made) (Being matched as 1 entire group instead of 2 groups for "5 hours" and 8 groups for "5 hours, 5 minutes and 5 seconds"):
- 5 hours, 5 minutes and 5 seconds
- 5 hours and 5 minutes
- 5 hours and 5 seconds
- 5 hours, 5 minutes, 5 seconds
- 5 hrs, 5 mins, 5 secs
- 5 hours
- 5 minutes
- 5 seconds
- 5 hours and 5 minutes
- 5 minutes and 5 seconds
- 5 minutes, 5 seconds
- 5 hours, 5 seconds
CodePudding user response:
You can match:
- hours with
(\d \s(?:hours|hour|hrs|hr|h))?
- minutes with
(\d \s(?:minutes|minute|mins|min|m))?
- seconds with
(\d\s(?:seconds|second|secs|sec|s))?
- separators with
(?:, | and )
If you combine these together, you get the regex you were looking for:
(\d \s(?:hours|hour|hrs|hr|h))?(?:, | and )?(\d \s(?:minutes|minute|mins|min|m))?(?:, | and )?(\d\s(?:seconds|second|secs|sec|s))?
Then you need to extract your groups. You can check the following Python code:
import re
strings = [
'5 hours, 5 minutes and 5 seconds',
'5 hours and 5 minutes',
'5 hours and 5 seconds',
'5 hours, 5 minutes, 5 seconds',
'5 hrs, 5 mins, 5 secs',
'5 hours',
'5 minutes',
'5 seconds',
'5 hours and 5 minutes',
'5 minutes and 5 seconds',
'5 minutes, 5 seconds',
'5 hours, 5 seconds'
]
pattern = r'(\d \s(?:hours|hour|hrs|hr|h))?(?:, | and )?(\d \s(?:minutes|minute|mins|min|m))?(?:, | and )?(\d\s(?:seconds|second|secs|sec|s))?'
print([re.search(pattern, string).groups() for string in strings])
#for string in strings:
# match = re.search(pattern, string)
# if match:
# print(match.group())
Check the regex demo and the python demo.
CodePudding user response:
That's because your regex allowing to match nothing
. Let's rewrite your regex:
(?<!\d)(\d??\d )?(hours|hour|hrs|hr|h)?( and |, )??(?<!\d)(\d??\d )?(minutes|minute|mins|min|m)?( and |, )?(?<!\d)(\d??\d )?(seconds|second|secs|sec|s)?
Into something easier to look:
(...)?(...)?(...)??(...)? ...
Every group you are making it optional, so it allow to match nothing
string.
Not just it matching empty lines, given a line of space
, it will create as many match as how many spaces there are.
Solution is instead of making (...)?(...)?
, change it into altenate (...)|(...)
How to do it is depend on your requirement. You have given no requirement, so I'm giving an example based on my guess:
(?<!\d)(\d?\d )|(hours|hour|hrs|hr|h)|( and |, )??(?<!\d)(\d??\d )|(minutes|minute|mins|min|m)|( and |, )|(?<!\d)(\d??\d )|(seconds|second|secs|sec|s)