split string using regex every 6 digits-CodePudding

i basically need to split a string before the next 6 digits and colon:

import re
my_str = '610640: 168 hours 610835: till next day 14:00 617041:  168 hours 611486:720 hours'
match = re.split(r'(\d{6}\:)', my_str)
print(match)
for item in match:
    print(item)

to read 610640: 168 hours and 610835: till next day 14:00 and 617041: 168 hours and so on. Other regex I've tried:

(\d{6}\:) .*?(\d{6}\:)

i've been using https://pythex.org/ to get an idea of how\what to write the regex

CodePudding user response：

With the match you are almost there, but you should turn the last part in a lookahead instead of a match and to get the last item using an alternation to assert the end of the string.

In this part (\d{6}\:) you can omit the group and the repetition as it occurs only 1 time and the colon does not have to be escaped.

 \b\d{6}:.*?(?=\s*(?:\d{6}:|$))

See a regex demo

If you want to use re.split, you might also use:

(?<!^)\b(?=\d{6}:)

See a regex demo

import re
my_str = '610640: 168 hours 610835: till next day 14:00 617041:  168 hours 611486:720 hours'
match = re.split(r'(?<!^)\b(?=\d{6}:)', my_str)
print(match)
for item in match:
    print(item.strip())

Output

['610640: 168 hours ', '610835: till next day 14:00 ', '617041:  168 hours ', '611486:720 hours']
610640: 168 hours
610835: till next day 14:00
617041:  168 hours
611486:720 hours

If there are always 1 or more leading whitespace chars, you could match them to split on and omit the word boundary:

match = re.split(r'\s (?=\d{6}:)', my_str)

See a Python demo.