Hello I'm trying to split a string without removing the delimiter and it can have multiple delimiters.
The delimiters can be 'D', 'M' or 'Y' For example:
>>>string = '1D5Y4D2M'
>>>re.split(someregex, string) #should ideally return
['1D', '5Y', '4D', '2M']
To keep the delimiter I use Python split() without removing the delimiter
>>> re.split('([^D] D)', '1D5Y4D2M')
['', '1D', '', '5Y4D', '2M']
For multiple delimiters I use In Python, how do I split a string and keep the separators?
>>> re.split('(D|M|Y)', '1D5Y4D2M')
['1', 'D', '5', 'Y', '4', 'D', '2', 'M', '']
Combining both doesn't quite make it.
>>> re.split('([^D] D|[^M] M|[^Y] Y)', string)
['', '1D', '', '5Y4D', '', '2M', '']
Any ideas?
CodePudding user response:
I'd use findall()
in your case. How about:
re.findall(r'\d [DYM]', string
Which will result in:
['1D', '5Y', '4D', '2M']
CodePudding user response:
(?<=(?:D|Y|M))
You need 0 width assertion split.Can be done using regex
module python.
See demo.
https://regex101.com/r/aKV13g/1
CodePudding user response:
You can split at the locations right after D
, Y
or M
but not at the end of the string with
re.split(r'(?<=[DYM])(?!$)', text)
See the regex demo. Details:
(?<=[DYM])
- a positive lookbehind that matches a location that is immediately preceded withD
orY
orM
(?!$)
- a negative lookahead that fails the match if the current position is the string end position.
Note
In the current scenario, (?<=[DYM])
can be used instead of a more verbose (?<=D|Y|M)
since all alternatives are single characters. If you have multichar delimiters, you would have to use a non-capturing group, (?:...)
, with lookbehind alternatives inside it. For example, to separate right after Y
, DX
and MZB
you would use (?:(?<=Y)|(?<=DX)|(?<=MZB))
. See Python Regex Engine - "look-behind requires fixed-width pattern" Error
CodePudding user response:
I think it will work fine without regex or split time complexity O(n)
string = '1D5Y4D2M'
temp=''
res = []
for x in string:
if x=='D':
temp ='D'
res.append(temp)
temp=''
elif x=='M':
temp ='M'
res.append(temp)
temp=''
elif x=='Y':
temp ='Y'
res.append(temp)
temp=''
else:
temp =x
print(res)
CodePudding user response:
using translate
string = '1D5Y4D2M'
delimiters = ['D', 'Y', 'M']
result = string.translate({ord(c): f'{c}*' for c in delimiters}).strip('.*').split('*')
print(result)
>>> ['1D', '5Y', '4D', '2M']