I'm using the following function to separate strings with commas right on the capitals, as long as it is not preceded by a blank space.
def func(x):
y = re.findall('[A-Z][^A-Z\s] (?:\s \S[^A-Z\s]*)*', x)
return ','.join(y)
However, when I try to separate the next string it removes the part with numbers.
Input = '49ersRiders Mapple'
Output = 'Riders Mapple'
I tried the following code but now it removes the 'ers' part.
def test(x):
y = re.findall(r'\d [A-Z]*|[A-Z][^A-Z\s] (?:\s \S[^A-Z\s]*)*', x)
return ','.join(y)
Output = '49,Riders Mapple'
The output I'm looking for is this:
'49ers,Riders Mapple'
Is it possible to add this indication to my regex?
Thanks in advance
CodePudding user response:
Here is a regex re.findall
approach:
inp = "49ersRiders"
output = ','.join(re.findall('(?:[A-Z]|[0-9])[^A-Z] ', inp))
print(output) # 49ers,Riders
The regex pattern used here says to match:
(?:
[A-Z] a leading uppercase letter (try to find this first)
| OR
[0-9] a leading number (fallback for no uppercase)
)
[^A-Z] one or more non capital letters following
CodePudding user response:
Maybe naive but why don't you use re.sub
:
def func(x):
return re.sub(r'(?<!\s)([A-Z])', r',\1', x)
inp = '49ersRiders Mapple'
out = func(inp)
print(out)
# Output
49ers,Riders Mapple