What is the best way to split a string into two parts in Python from right to left until one of several characters occur?
The aim is to separate a string into two parts with a version number (beginning with either A, B or C) at the end for examples like these:
- EP3293036A1 -> EP3293036 A1
- US10661612B2 -> US10661612 B2
- CN107962948A -> CN107962948 A
- ES15258411C1 -> ES15258411 C1
My code works for splitting the string for a single character:
first_part = number.rpartition('A')[0]
second_part = number.rpartition('A')[1] number.rpartition('A')[2]
Is there a way to have multiple arguments like ('A' or 'B' or 'C') using rpartition? Or is there a better way using regex?
CodePudding user response:
Try this.
import re
def split_re(s):
return re.split(r'.(?=[ABC]) ',s) # Change `ABC` to `A-Za-z` if you want a partition if any alphabetic character is present likt('A','a','z','Y')
print(split_re('EP3293036A1')) # -> ['EP3293036', 'A1']
print(split_re('US10661612B2')) # -> ['US10661612', 'B2']
print(split_re('CN107962948A')) # -> ['CN107962948','A']
print(split_re('ES15258411C1')) # -> ['ES15258411', 'C1']
CodePudding user response:
Use re.findall
. Using the regex shown, this function extracts the parts in parentheses: (.*?)
- any characters repeated 0 or more times, non-greedy; ([AB]\d*)$
- A or B, followed by 0 or more digits, followed by the end of the string.
import re
lst = ['EP3293036A1', 'EP3293036B']
for s in lst:
parts = re.findall(r'(.*?)([AB]\d*)$', s)
print(f's={s}; parts={parts}')
# s=EP3293036A1; parts=[('EP3293036', 'A1')]
# s=EP3293036B; parts=[('EP3293036', 'B')]