In a string, I need all timecodes to be formatted as following [HH:MM:SS] or [HH:MM:SS.ms]. Some of them are already in brackets. They can be everywhere, beginning, middle, or end of a phrase.
I'd like to put those not in brackets in brackets.
To select all of them I use:
[\[]?\d\d:\d\d:\d\d(.\d )?[\]]?
I tried
(?!\[. \])(.|^)(\d\d:\d\d:\d\d(.\d )?)(.|$)(?!\[. \])
Which is almost fine except that my selection $2
includes space characters in the case of string not beggining by ^ or finishing by $.
How can I get rid of this selection?
CodePudding user response:
You can use
re.sub(r'\[?\b(\d{2}:\d{2}:\d{2}(?:\.\d )?)\b]?', r'[\1]', text)
See the regex demo. Details:
\[?
- an optional[
char\b
- a word boundary(\d{2}:\d{2}:\d{2}(?:\.\d )?)
- Group 1:\d{2}:\d{2}:\d{2}
- two digits, and then two occurrences of:
and two digits(?:\.\d )?
- an optional sequence of.
and one or more digits
\b
- a word boundary]?
- an optional]
char
To make sure you match 24-hour time format you can use a more precise pattern:
\[?\b((?:[01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](?:\.[0-9] )?)\b]?
See this demo.