Home > Mobile >  Find closing parenthesis with regex in r
Find closing parenthesis with regex in r

Time:12-14

I have several strings with open and unclosed parenthesis. I managed to remove the opening parenthesis (if there is no closing one), but I do not manage to remove the closing parenthesis if there is no opening one. I want to leave those with matching parenthesis alone

string1 = "This (is solved"
string2 = "This is (fine)"
string3 = "This is the problem)"

This is what I was able to remove the first Problem case with (Opening parenthesis but no opening)

str_remove(data, "[(](?!.*[)])") 

But I cannot seem to turn it around. The following grabs all closing parenthesis, but not the one without an oping.

"(?!.*[(])[)]"

Any ideas are appreciated!

CodePudding user response:

If you do not need to handle nested paired (balanced) parentheses, you can use

gsub("(\\([^()]*\\))|[()]", "\\1", string)

See the regex demo. Details:

  • (\([^()]*\)) - Group 1 (\1 refers to this group value): (, then zero or more chars other than ( and ), and then a ) char
  • | - or
  • [()] - a ( or ) char.

See the R demo:

x <- c("This (is solved", "This is (fine)", "This is the problem)")
gsub("(\\([^()]*\\))|[()]", "\\1", x)
# => [1] "This is solved"      "This is (fine)"      "This is the problem"

If the parentheses can be nested, you can use

gsub("(\\((?:[^()]  |(?1))*\\))|[()]", "\\1", string, perl=TRUE)

See this regex demo. Details:

  • (\((?:[^()] |(?1))*\)) - Group 1:
    • \( - a ( char
    • (?:[^()\n] |(?1))* - zero or more sequences of either one or more chars other than ( and ), or the whole Group 1 pattern that is recursed
    • \) - a ) char
  • |[()] - or a ( / ) char.
  • Related