Home > Back-end >  Split a string after multiple delimiters and include it
Split a string after multiple delimiters and include it

Time:09-28

Hello I'm trying to split a string without removing the delimiter and it can have multiple delimiters.

The delimiters can be 'D', 'M' or 'Y' For example:

>>>string = '1D5Y4D2M'
>>>re.split(someregex, string) #should ideally return
['1D', '5Y', '4D', '2M']

To keep the delimiter I use Python split() without removing the delimiter

>>> re.split('([^D] D)', '1D5Y4D2M')
['', '1D', '', '5Y4D', '2M']

For multiple delimiters I use In Python, how do I split a string and keep the separators?

>>> re.split('(D|M|Y)', '1D5Y4D2M')
['1', 'D', '5', 'Y', '4', 'D', '2', 'M', '']

Combining both doesn't quite make it.

>>> re.split('([^D] D|[^M] M|[^Y] Y)', string)
['', '1D', '', '5Y4D', '', '2M', '']

Any ideas?

CodePudding user response:

I'd use findall() in your case. How about:

re.findall(r'\d [DYM]', string

Which will result in:

['1D', '5Y', '4D', '2M']

CodePudding user response:

(?<=(?:D|Y|M))

You need 0 width assertion split.Can be done using regex module python.

See demo.

https://regex101.com/r/aKV13g/1

CodePudding user response:

You can split at the locations right after D, Y or M but not at the end of the string with

re.split(r'(?<=[DYM])(?!$)', text)

See the regex demo. Details:

  • (?<=[DYM]) - a positive lookbehind that matches a location that is immediately preceded with D or Y or M
  • (?!$) - a negative lookahead that fails the match if the current position is the string end position.

Note

In the current scenario, (?<=[DYM]) can be used instead of a more verbose (?<=D|Y|M) since all alternatives are single characters. If you have multichar delimiters, you would have to use a non-capturing group, (?:...), with lookbehind alternatives inside it. For example, to separate right after Y, DX and MZB you would use (?:(?<=Y)|(?<=DX)|(?<=MZB)). See Python Regex Engine - "look-behind requires fixed-width pattern" Error

CodePudding user response:

I think it will work fine without regex or split time complexity O(n)

string = '1D5Y4D2M'
temp=''
res = []
for x in string:
    if x=='D':
        temp ='D'
        res.append(temp)
        temp=''
    elif x=='M':
        temp ='M'
        res.append(temp)
        temp=''
    elif x=='Y':
        temp ='Y'
        res.append(temp)
        temp=''
    else:
        temp =x
print(res)

CodePudding user response:

using translate

string = '1D5Y4D2M'

delimiters = ['D', 'Y', 'M']
result = string.translate({ord(c): f'{c}*' for c in delimiters}).strip('.*').split('*')
print(result)

>>> ['1D', '5Y', '4D', '2M']
  • Related