Home > Blockchain >  How to split a string in Python until one of specific characters occurs from right to left?
How to split a string in Python until one of specific characters occurs from right to left?

Time:06-07

What is the best way to split a string into two parts in Python from right to left until one of several characters occur?

The aim is to separate a string into two parts with a version number (beginning with either A, B or C) at the end for examples like these:

  • EP3293036A1 -> EP3293036 A1
  • US10661612B2 -> US10661612 B2
  • CN107962948A -> CN107962948 A
  • ES15258411C1 -> ES15258411 C1

My code works for splitting the string for a single character:

first_part = number.rpartition('A')[0]
second_part = number.rpartition('A')[1]   number.rpartition('A')[2]

Is there a way to have multiple arguments like ('A' or 'B' or 'C') using rpartition? Or is there a better way using regex?

CodePudding user response:

Try this.

import re

def split_re(s):
    return re.split(r'.(?=[ABC]) ',s) # Change `ABC` to `A-Za-z` if you want a partition if any alphabetic character is present likt('A','a','z','Y')
print(split_re('EP3293036A1'))  # -> ['EP3293036', 'A1']
print(split_re('US10661612B2')) # -> ['US10661612', 'B2']
print(split_re('CN107962948A')) # -> ['CN107962948','A']
print(split_re('ES15258411C1')) # -> ['ES15258411', 'C1']

CodePudding user response:

Use re.findall. Using the regex shown, this function extracts the parts in parentheses: (.*?) - any characters repeated 0 or more times, non-greedy; ([AB]\d*)$ - A or B, followed by 0 or more digits, followed by the end of the string.

import re
lst = ['EP3293036A1', 'EP3293036B']

for s in lst:
    parts = re.findall(r'(.*?)([AB]\d*)$', s)
    print(f's={s}; parts={parts}')

# s=EP3293036A1; parts=[('EP3293036', 'A1')]
# s=EP3293036B; parts=[('EP3293036', 'B')]
  • Related