python re find pattern and split into list-CodePudding

I am looking to split the following string into the list as shown below:

string = 'M43.16: Spondylolisthesis, lumbar region, ' \
    'M51.27: Other intervertebral disc displacement, lumbosacral region, ' \
    'M54.17: Radiculopathy, lumbosacral region'

The "M43.16" like codes can be of varying length, all start with a capital letter, followed by digits with one decimal point and possibly end with a non-capital letter. Then it's followed by its description and what region of the body (separated by the second comma).

Desired list should split before the start of the next code and its description:

list = ['M43.16: Spondylolisthesis, lumbar region', 'M51.27: Other intervertebral disc displacement, lumbosacral region, 'M54.17: Radiculopathy, lumbosacral region']

What I've tried so far but fail to stop the match before the next code:

re.findall("[A-Z][A-Z0-9. ]*: [A-Za-z, ]* [A-Za-z, ]", string)

CodePudding user response：

re.findall(r"[A-Z][A-Z0-9.]*: [A-Za-z ] , [A-Za-z ] ", string)

CodePudding user response：

Here is a working solution. I used a matching pattern for the second part that is repeated and excludes :.

import re
re.findall("[A-Z][A-Z0-9. ]*:(?:\s*[^:] ,) ", string ',')

output:

['M43.16: Spondylolisthesis, lumbar region,',
 'M51.27: Other intervertebral disc displacement, lumbosacral region,',
 'M54.17: Radiculopathy, lumbosacral region,']

You can test the regex here

CodePudding user response：

My attempt: [\w.]*: [A-Za-z, ]*[^,\sA-Z\d]. No trailing commas or spaces.

import re
s = "M43.16: Spondylolisthesis, lumbar region, M51.27: Other intervertebral disc displacement, lumbosacral region, M54.17: Radiculopathy, lumbosacral region"
re.findall(r"[\w.]*: [A-Za-z, ]*[^,\sA-Z\d]", s)

['M43.16: Spondylolisthesis, lumbar region',
 'M51.27: Other intervertebral disc displacement, lumbosacral region',
 'M54.17: Radiculopathy, lumbosacral region']