Home > Mobile >  Regex: return contents of parentheses unless substring is present inside, then return everything out
Regex: return contents of parentheses unless substring is present inside, then return everything out

Time:06-13

I have a series of strings that looks like this:

str_one = "Coder X.Y. (Something Dogs Do X.Y.)"
str_two = "Something Cats Ignore X.Y. (Coder X.Y.)"

I want to be able to extract the contents of the parentheses unless there is a specific substring contained within (let's say "coder"), then I want everything outside the parentheses returned. For the above examples, I want the outputs to look like this:

output_str_one = "Something Dogs Do X.Y."
output_str_two = "Something Cats Ignore X.Y."

What I have so far is just a catch of substring in the parentheses if a certain word is present:

import re
re.findall('\(([^(|)]*Something[^)|(]*)\)', output_str_one)

That doesn't work for the second example obviously.

I could just split the string by "(" and then select the resulting string that does not contain the substring 'coder' ie.:

str_one_split = re.split("\(|\)", str_one)
str_one_split = list(filter(None, str_one_split))
res = [x for x in str_one_split if "coder" not in x.lower()]
print(res)
>>['Something Dogs Do X.Y.']

But really I was hoping for a regex expression so that I can add it to an expression being used in SQL.

Any help/guidance would be much appreciated!

CodePudding user response:

You may use this regex with an alternation and lookahead assertions:

(?<=\()(?![^()]*Coder)[^()] (?=\))|[^(] (?=\([^()]*Coder)

RegEx Demo

RegEx Details:

  • (?<=\()(?![^()]*Coder)[^()] (?=\)): Match string inside (...) if it doesn't contain Coder
  • |: OR
  • [^(] (?=\([^()]*Coder): Match string before first ( if we have Coder inside the (...)
  • Related