Home > other >  Giving more priority to later capture group in date regex
Giving more priority to later capture group in date regex

Time:10-11

I am preparing a Python regex expression to match a idiosyncratic format for timedeltas.

Here are some examples of strings I will feed into the regex:

1:42.15 (1)
19.78 (1)
2-24:04
8:03.280 (1)

So the overall format is hour-minute:second.second_fractions, sometimes padded with zeroes. The number in parenthesis that appears at the end of some strings must be ignored.

I would like to match each line using three capture groups so that the 1st group is always the hour, the 2nd is always the minute and the 3rd is always the seconds and second fractions together.

I wrote this regex: (\d{0,2})-?(\d{0,2}):?(\d{0,2}\.?\d*)

This succesfully matches all examples I have tried it on, but there is an issue. Because of the greedy way regex matched, when eg the hour is missing the minute is captured by the first capture group instead of the second as I intended.

That is, with the input 1:42.15 (1) I get the output ('1', '', '42.15'). What I actually wanted is the output ('', '1', '42.15') - the minute always corresponding to the second capture group.

How can I modify the priorities of the capture groups to achieve this behaviour?

CodePudding user response:

You can make the first two optional parts contain obligatory delimiters, and use

^(?:(\d{0,2})-)?(?:(\d{0,2}):)?(\d{0,2}\.?\d*)

See the regex demo. Details:

  • ^ - start of string
  • (?:(\d{0,2})-)? - an optional non-capturing group that matches one or zero occurrences of
    • (\d{0,2}) - Group 1: zero to two digits
    • - - a hyphen
  • (?:(\d{0,2}):)? - an optional non-capturing group that matches one or zero occurrences of
    • (\d{0,2}) - Group 2: zero to two digits
    • : - a colon
  • (\d{0,2}\.?\d*) - Group 3: zero to two digits, an optional . and then zero or more digits.
  • Related