Home > Net >  Removing parenthesized subexpression
Removing parenthesized subexpression

Time:06-22

I have a String and want to delete from it all substrings with the following properties:

1. They start with an arbitrary (non-zero) number of open parenthesis
2. Then follows an arbitrary set of word characters (`\w`)
3. Then follows the same number of closing parenthesis as there have been open parenthesis.

Pure regular expressions can not match open and closing parenthesis. My first attempt wsa to find a way to use backreferences dynamically. I know that this is not valid Ruby, but to give you an idea:

sttrep = str.gsub(/([(] ) \w  [)]#{\1.size}/x, '')

Of course the \1.size is invalid; but is there a way using interpolation, where I could evaluate something based on a backreference?

Another possible would be to repeatedly use gsub in a loop and remove one level of parenthesis at a time:

tmpstr = str
loop do
  strrep = tmpstr.gsub(/[(] ([(]\w [)]) [)]/x, "(\\1)")
  if tmpstr == strrep
    # We only have one level of parenthesis to consider
    sttrep = str.gsub(/[(]\w [)]/x, '')
    break
  else
    tmpstr = strrep
  end
end
# strrep is now the resulting string
    

However, this seems to be an overly complicated solution. Any ideas (except of course writing my owen string parser which loops over each character and counts the parenthesis)?

UPDATE:

Example1:

str = "ab((((cd))))ef((gh))ij(kl)mn"

strrep should contain abefijmn.

Example2:

str = "((((abc));def;((ghi)))"

strrep should contain (;def;).

CodePudding user response:

As far as I understand, you don't need to parse anything "complex" like arbitrary S-expressions etc - all you're interested in is just to eliminate things like (((foo))) and ((bar)) (they have the same number of opening/closing parens) but keep things like (((foo)) bar) intact.

If this assumption is correct then quite simple gsub can do the job:

def delete_parentheses(str)
  str.gsub(/(\( )\w (\) )/) do |match|
    $1.size == $2.size ? "" : match
  end
end

delete_parentheses("Here ((be)) dragons") # => Here dragons
delete_parentheses("Here ((be) dragons") # Here ((be) dragons

CodePudding user response:

In general, to match strings you described, you need to use regex subroutines:

(\((?:\w |\g<1>)?\))

See the regex demo.

Details:

  • (\((?:\w |\g<1>)?\)) - Group 1 (capturing is necessary for recursion purposes):
    • \( - a ( char
    • (?:\w |\g<1>)? - an optional occurrence of one or more word chars or Group 1 pattern recursed
    • \) - a ) char.

To make it a bit more efficient, consider using an atomic group rather than a non-capturing group:

(\((?>\w |\g<1>)?\))
    ^^

See the Ruby demo:

puts [
    'ab((((cd))))ef((gh))ij(kl)mn',
    '((((abc));def;((ghi)))',
    '(((foo)) , bar)'
].map {|x| x.gsub(/(\((?:\w |\g<1>)?\))/, '')}

Output:

abefijmn
((;def;)
( , bar)
  • Related