I'm trying to split a string in a list of strings. Right now i have to split whenever I see any of these characters: '.', ';', ':', '?', '!', '( )', '[ ]', '{ }' (keep in mind that I have to mantain whatever is inside the brackets). To solve it I tried to write
print(re.split("\(([^)]*)\)|[.,;:?!]\s*", "Hello world,this is(example)"))
but as output I get:
['Hello world', None, 'this is', 'example', '']
Omitting the ' ' at the end that I'll solve later, how can I remove the None that appears in the middle of the list? By the way I can't iterate in the list another time because the program shall work with huge files and I have to make it as fast as possible. Also I don't have to necessarily use re.split so everything that works will be just fine!
I'm still new at this so I'm sorry if something is incorrect.
CodePudding user response:
Not sure if this is fast enough but you could do this:
re.sub(r";|,|:|\(|\)|\[|\]|\?|\.|\{|\}|!", " ", "Hello world,this is(example)").split()