I stumbled over this problem using the following simplified example:
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String
stored in searchstring
, the gsub!
would cause that line
is afterwards empty. Indeed, this is the case for many strings, but not for this case:
searchstring = "D "
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
p line
It turns out, that line
is printed as "D "
afterwards, i.e. no replacement had been performed.
This happens to any searchstring
containing a space. Indeed, if I do a
p(Regexp.escape(searchstring))
for my example, I see "D\\ "
being printed, while I would expect to get "D "
instead. Is this a bug in the Ruby core library, or did I misuse the escape
function?
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string, in the following way:
REPLACEMENTS.each do
|from, to|
line.chomp!
line.gsub!(Regexp.escape(from)) { to }
end
. I'm using Regexp.escape
just as a safety measure in the case that the string being replaced contains some regex metacharacter.
I'm using the Cygwin port of MRI Ruby 2.6.4.
CodePudding user response:
line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty.
Your understanding is incorrect. The guarantee in the docs is
For any string,
Regexp.new(Regexp.escape(str))=~str
will be true.
This does hold for your example
Regexp.new(Regexp.escape("D "))=~"D " # => 0
therefore this is what your code should look like
line.gsub!(Regexp.new(Regexp.escape(searchstring))) { '' }
As for why this is the case, there used to be a bug where Regex.escape
would incorrectly handle space characters:
# in Ruby 1.8.4
Regex.escape("D ") # => "D\\s"
My guess is they tried to keep the fix as simple as possible by replacing 's'
with ' '
. Technically this does add an unnecessary escape character but, again, that does not break the intended use of the method.
CodePudding user response:
This happens to any
searchstring
containing a space. Indeed, if I do ap(Regexp.escape(searchstring))
for my example, I see
"D\\ "
being printed, while I would expect to get"D "
instead. Is this a bug in the Ruby core library, or did I misuse theescape
function?
This looks to be a bug. In my opinion, whitespace is not a Regexp
meta character, there is no need to escape it.
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string […]
If you want to do literal string replacement, then don't use a Regexp
. Just use a literal string:
line.gsub!(from, to)