I have a series of strings that looks like this:
str_one = "Coder X.Y. (Something Dogs Do X.Y.)"
str_two = "Something Cats Ignore X.Y. (Coder X.Y.)"
I want to be able to extract the contents of the parentheses unless there is a specific substring contained within (let's say "coder"), then I want everything outside the parentheses returned. For the above examples, I want the outputs to look like this:
output_str_one = "Something Dogs Do X.Y."
output_str_two = "Something Cats Ignore X.Y."
What I have so far is just a catch of substring in the parentheses if a certain word is present:
import re
re.findall('\(([^(|)]*Something[^)|(]*)\)', output_str_one)
That doesn't work for the second example obviously.
I could just split the string by "(" and then select the resulting string that does not contain the substring 'coder' ie.:
str_one_split = re.split("\(|\)", str_one)
str_one_split = list(filter(None, str_one_split))
res = [x for x in str_one_split if "coder" not in x.lower()]
print(res)
>>['Something Dogs Do X.Y.']
But really I was hoping for a regex expression so that I can add it to an expression being used in SQL.
Any help/guidance would be much appreciated!
CodePudding user response:
You may use this regex with an alternation and lookahead assertions:
(?<=\()(?![^()]*Coder)[^()] (?=\))|[^(] (?=\([^()]*Coder)
RegEx Details:
(?<=\()(?![^()]*Coder)[^()] (?=\))
: Match string inside(...)
if it doesn't containCoder
|
: OR[^(] (?=\([^()]*Coder)
: Match string before first(
if we haveCoder
inside the(...)