Home > other >  How to convert all regex properly?
How to convert all regex properly?

Time:04-10

I have some SRT files, with timestamps and subtitles such as this (original):

1
00:00:00,000 --> 00:00:02,016
how are you

2
00:00:02,112 --> 00:00:05,472
i m fine

3
00:00:06,048 --> 00:00:08,448
thanks

I have to translate to other languages, somehow after translation, the timestamps are screwed up such as this (commas are gone, numbers are not in proper format which can't be shown here):

1
00:00:00000 --> 00:00:02016
Comment ça va?

2
00:00:02112 --> 00:00:05472
Je vais bien

3
00:00:06048 --> 00:00:08448
Merci beaucoup

People suggested to fix it by doing this:

f = open(filename, "r", encoding="utf8")
regex = r"\d{2}:\d{2}:\d{2}(?=\d{3}\b)"
 
s = (f.read())
 
result = re.sub(regex, r"\g<0>,", s)
 
if result:
    print (result)

The problem I have is that everthing is fine EXCEPT for the last timestamps (00:00:06048 --> 00:00:08448 in this case), it can't be converted properly by the code. How to fix it? Thanks.

CodePudding user response:

loop and search the string to translate

Regex info:

match index: ^\d $
match timestamps: \d{2}:. --> 
import re
 
srt = '''1
00:00:00,000 --> 00:00:02,016
how are you

2
00:00:02,112 --> 00:00:05,472
i m fine
thanks

3
00:00:06,048 --> 00:00:08,448
welcome'''

def translate(s):
  return s[::-1]

srt = srt.split("\n")

for i in range(len(srt)):
  line = srt[i].strip()
  if line and not re.match(r"^\d $|\d{2}:. -->", line, re.MULTILINE):
    srt[i] = translate(line)

print("\n".join(srt))

CodePudding user response:

I got a solution for this problem, it's not the timestamp, I just create a new blank line at the end and everything works fine....

  • Related