I have a String and want to delete from it all substrings with the following properties:
1. They start with an arbitrary (non-zero) number of open parenthesis
2. Then follows an arbitrary set of word characters (`\w`)
3. Then follows the same number of closing parenthesis as there have been open parenthesis.
Pure regular expressions can not match open and closing parenthesis. My first attempt wsa to find a way to use backreferences dynamically. I know that this is not valid Ruby, but to give you an idea:
sttrep = str.gsub(/([(] ) \w [)]#{\1.size}/x, '')
Of course the \1.size is invalid; but is there a way using interpolation, where I could evaluate something based on a backreference?
Another possible would be to repeatedly use gsub
in a loop and remove one level of parenthesis at a time:
tmpstr = str
loop do
strrep = tmpstr.gsub(/[(] ([(]\w [)]) [)]/x, "(\\1)")
if tmpstr == strrep
# We only have one level of parenthesis to consider
sttrep = str.gsub(/[(]\w [)]/x, '')
break
else
tmpstr = strrep
end
end
# strrep is now the resulting string
However, this seems to be an overly complicated solution. Any ideas (except of course writing my owen string parser which loops over each character and counts the parenthesis)?
UPDATE:
Example1:
str = "ab((((cd))))ef((gh))ij(kl)mn"
strrep
should contain abefijmn.
Example2:
str = "((((abc));def;((ghi)))"
strrep
should contain (;def;).
CodePudding user response:
As far as I understand, you don't need to parse anything "complex" like arbitrary S-expressions etc - all you're interested in is just to eliminate things like (((foo)))
and ((bar))
(they have the same number of opening/closing parens) but keep things like (((foo)) bar)
intact.
If this assumption is correct then quite simple gsub
can do the job:
def delete_parentheses(str)
str.gsub(/(\( )\w (\) )/) do |match|
$1.size == $2.size ? "" : match
end
end
delete_parentheses("Here ((be)) dragons") # => Here dragons
delete_parentheses("Here ((be) dragons") # Here ((be) dragons
CodePudding user response:
In general, to match strings you described, you need to use regex subroutines:
(\((?:\w |\g<1>)?\))
See the regex demo.
Details:
(\((?:\w |\g<1>)?\))
- Group 1 (capturing is necessary for recursion purposes):\(
- a(
char(?:\w |\g<1>)?
- an optional occurrence of one or more word chars or Group 1 pattern recursed\)
- a)
char.
To make it a bit more efficient, consider using an atomic group rather than a non-capturing group:
(\((?>\w |\g<1>)?\))
^^
See the Ruby demo:
puts [
'ab((((cd))))ef((gh))ij(kl)mn',
'((((abc));def;((ghi)))',
'(((foo)) , bar)'
].map {|x| x.gsub(/(\((?:\w |\g<1>)?\))/, '')}
Output:
abefijmn
((;def;)
( , bar)