Home > OS >  Regex in python for time format followed by comma and three digits
Regex in python for time format followed by comma and three digits

Time:10-13

I have a file with thousands of time formats. Some of them are in their standard formats, while others are followed by a comma and three digits like this:

    Standard format: 00:00:44
    Followed by comma and three digits: 00:00:46,235

I've removed the standard formats using the following regex:

   text = re.sub(r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d$)', '', text)

And that is ok. But for the time format followed by comma and three digits nothing that I've tried so far has helped me to remove them. Please, how can I remove this odd time format pattern?

CodePudding user response:

Your regex matches the standard time format.

r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d$)'

Just add the comma part at the end, and make it optional.

r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d(?:,\d{3})?$)'

Explanation for (?:,\d{3})?:

(?:      )     Non-capturing group
   ,\d{3}      Comma, then three digits
          ?    Match zero or one times

CodePudding user response:

The quick and dirty way is to use split():

text = text.split(",")[0]
text = re.sub(r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d$)', '', text)

You can also update your regex to use add an optional part at the end.

text = re.sub(r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d),?\d{0,3}$', '', text)

CodePudding user response:

Using re.sub:

inp = "Followed by comma and three digits: 00:00:46,235"
output = re.sub(r'\b(\d{2}:\d{2}:\d{2}),\d{3}', r'\1', inp)
print(output)  # Followed by comma and three digits: 00:00:46
  • Related