I am creating a regex to parse a string of time zones. The output must be reading input in the following form:
0930
0930 10930-1
<0930
(>0930) (the brackets are just to avoid stack reading this as '<>')
(<0920 1)
(>0920 1)
0920-1240 1
1200-1-1430
1200-1-1400 1
0920-1240 <<<<<<<<<<<<<<<<<<<<<<<<<ISSUE HERE
The regex cannot differentiate between hhmm-1, and hhmm-hhmm. It will read '0900-1200' as '0900-1'.
I have attempted many variateions of the regex, including:
r'([<>])?([0-9]{2})([0-9]{2})([ -]?)([0-1]?)|([0-9]{2})([0-9]{2})'
r'([<>])?([0-9]{2})([0-9]{2})([ -])?([0-1]?)(([0-1]?{4})()'
r'([<>])?([0-9]{2})([0-9]{2})([ -])?([0-1]?)(?([0-1]?)()'
Currently just considering using 2 different ones! One to test for case of hyphenated time string, the other for the rest,which work for me. I would like the output in a list of tuples, like
[('', '09', '30', '-', '','12','30', '-', '1'),
('', '09', '30', '-', '1','','', '', ''),
('>', '09', '30', '-', '1','','', '', '').....]
CodePudding user response:
You can use
([<>])?([0-9]{2})([0-9]{2})(?:([ -])([01])(?!\d{3}\b))?(?:([ -])([0-9]{2})([0-9]{2})(?:([ -])([01])(?!\d{3}\b))?)?
See the regex demo. Details:
([<>])?
- Group 1 (optional):<
or>
([0-9]{2})
- Group 2: two digits([0-9]{2})
- Group 3: two digits(?:([ -])([01])(?!\d{3}\b))?
- an optional group matching a sequence of:([ -])
- Group 4:-
([01])(?!\d{3}\b)
- Group 5:1
or0
that are not followed with 3 more digits followed with a word boundary
(?:
- start of a non-capturing group:([ -])
- Group 6:-
([0-9]{2})
- Group 7: two digits([0-9]{2})
- Group 8: two digits(?:([ -])([01])(?!\d{3}\b))?
- Optional sequence of-
captured in Group 9 and then1
or0
(captured in Group 10) that are not followed with 3 more digits followed with a word boundary
)?
- end of non-capturing group, repeat 1 or 0 times.