Home > Back-end >  Split a text by specific word or phrase and keep the word in Python
Split a text by specific word or phrase and keep the word in Python

Time:10-13

Is there any elegant way of splitting a text by a word and keep the word as well. Although there are some works around split with re package and pattern like (Python RE library String Split but keep the delimiters/separators as part of the next string), but none of them works for this scenario when the delimiter is repeated multiple times. For example:

 s = "I want to split text here, and also keep here, and return all as list items"

Using partition:

 s.partition("here")
>> ('I want to split text ', 'here', ', and also keep here, and return all as list items')

Using re.split():

re.split("here",s)
>> ['I want to split text ', ', and also keep ', ', and return all as list items']

The desired output should be something to the following list:

['I want to split text', 'here', ' , and also keep ', 'here', ' , and return all as list items']

CodePudding user response:

Yes. What you're looking for is a feature of the re.split() method. If you use a capture group in the expression, it will return the matched terms as well:

import re

s = "I want to split text here, and also keep here, and return all as list items"

r = re.split('(here)', s)

print(r)

Result:

['I want to split text ', 'here', ', and also keep ', 'here', ', and return all as list items']

If you define multiple capture groups, it will return each of them individually. So you can return just a part of the delimiter, or multiple parts that each get returned. I've done some fairly crazy things with this feature in the past. It can replace an appreciable amount of code that would otherwise be necessary.

CodePudding user response:

Using re is no doubt the best way, but you could also extend the partition() method recursively.

def partitions(whole_string, split_string):
    parts_tuple = whole_string.partition(split_string)
    return [parts_tuple[0], parts_tuple[1], *partitions(parts_tuple[2], split_string)] if parts_tuple[1] else [whole_string]
  • Related