split string with RegEx pattern with words and numbers-CodePudding

My question came along when trying to help in this post

I'm searching for a Regex pattern which splits this string at 1., 2. and 3. or in general: split after a digit (or more if the list would be longer) followed by a dot. Problem is that there are more numbers in the string which are needed.

test_string = '1. Fruit 12 oranges 2. vegetables 7 carrot 3. NFL 246 SHIRTS'

With this pattern I managed to do so, but I got an empty string at the start and didn't know how to change that.

l1 = re.split(r"\s?\d{1,2}\.", test_string)

Output l1:
['', ' Fruit 12 oranges', ' vegetables 7 carrot', ' NFL 246 SHIRTS']

So I changed from split it to search something that finds the pattern.

l2 = re.findall(r"(?:^|(?<=\d\.))([\sa-zA-Z0-9] )(?:\d\.|$)", pattern)

Output l2:
[' Fruit 12 oranges ', ' vegetables 7 carrot ', ' NFL 246 SHIRTS']

It is really close to be fine with it, just the trailing whitespace at the beginning of every element in the list.

What would be a good and efficient approach for my task? Stick with the splitting with re.split() or building a pattern and use re.findall()? Is my pattern good like I have done it or is it way too complicated?

CodePudding user response：

By just adding twice (?:\s) to your expression:

re.findall(r"(?:^|(?<=\d\.))(?:\s)([\sa-zA-Z0-9] )(?:\s\d\.|$)", test_string)

the output is : ['Fruit 12 oranges', 'vegetables 7 carrot', 'NFL 246 SHIRTS']