Home > database >  Split with pair of brackets and keep content between
Split with pair of brackets and keep content between

Time:05-30

I'm trying to split a string in a list of strings. Right now i have to split whenever I see any of these characters: '.', ';', ':', '?', '!', '( )', '[ ]', '{ }' (keep in mind that I have to mantain whatever is inside the brackets). To solve it I tried to write

print(re.split("\(([^)]*)\)|[.,;:?!]\s*", "Hello world,this is(example)"))

but as output I get:

['Hello world', None, 'this is', 'example', '']

Omitting the ' ' at the end that I'll solve later, how can I remove the None that appears in the middle of the list? By the way I can't iterate in the list another time because the program shall work with huge files and I have to make it as fast as possible. Also I don't have to necessarily use re.split so everything that works will be just fine!

I'm still new at this so I'm sorry if something is incorrect.

CodePudding user response:

Not sure if this is fast enough but you could do this:

re.sub(r";|,|:|\(|\)|\[|\]|\?|\.|\{|\}|!", " ", "Hello world,this is(example)").split()
  • Related