i basically need to split a string before the next 6 digits and colon:
import re
my_str = '610640: 168 hours 610835: till next day 14:00 617041: 168 hours 611486:720 hours'
match = re.split(r'(\d{6}\:)', my_str)
print(match)
for item in match:
print(item)
to read 610640: 168 hours
and 610835: till next day 14:00
and 617041: 168 hours
and so on. Other regex I've tried:
(\d{6}\:) .*?(\d{6}\:)
i've been using https://pythex.org/ to get an idea of how\what to write the regex
CodePudding user response:
With the match you are almost there, but you should turn the last part in a lookahead instead of a match and to get the last item using an alternation to assert the end of the string.
In this part (\d{6}\:)
you can omit the group and the repetition as it occurs only 1 time and the colon does not have to be escaped.
\b\d{6}:.*?(?=\s*(?:\d{6}:|$))
See a regex demo
If you want to use re.split, you might also use:
(?<!^)\b(?=\d{6}:)
See a regex demo
import re
my_str = '610640: 168 hours 610835: till next day 14:00 617041: 168 hours 611486:720 hours'
match = re.split(r'(?<!^)\b(?=\d{6}:)', my_str)
print(match)
for item in match:
print(item.strip())
Output
['610640: 168 hours ', '610835: till next day 14:00 ', '617041: 168 hours ', '611486:720 hours']
610640: 168 hours
610835: till next day 14:00
617041: 168 hours
611486:720 hours
If there are always 1 or more leading whitespace chars, you could match them to split on and omit the word boundary:
match = re.split(r'\s (?=\d{6}:)', my_str)
See a Python demo.