Home > Back-end >  Regexp.escape adds weird escapes to a plain space
Regexp.escape adds weird escapes to a plain space

Time:10-04

I stumbled over this problem using the following simplified example:

line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }

My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty. Indeed, this is the case for many strings, but not for this case:

searchstring =  "D "
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
p line

It turns out, that line is printed as "D " afterwards, i.e. no replacement had been performed.

This happens to any searchstring containing a space. Indeed, if I do a

p(Regexp.escape(searchstring))

for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?

Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string, in the following way:

REPLACEMENTS.each do
  |from, to|
  line.chomp!
  line.gsub!(Regexp.escape(from)) { to }      
end

. I'm using Regexp.escape just as a safety measure in the case that the string being replaced contains some regex metacharacter.

I'm using the Cygwin port of MRI Ruby 2.6.4.

CodePudding user response:

line.gsub!(Regexp.escape(searchstring)) { '' }

My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty.

Your understanding is incorrect. The guarantee in the docs is

For any string, Regexp.new(Regexp.escape(str))=~str will be true.

This does hold for your example

Regexp.new(Regexp.escape("D "))=~"D " # => 0

therefore this is what your code should look like

line.gsub!(Regexp.new(Regexp.escape(searchstring))) { '' }

As for why this is the case, there used to be a bug where Regex.escape would incorrectly handle space characters:

# in Ruby 1.8.4
Regex.escape("D ") # => "D\\s"

My guess is they tried to keep the fix as simple as possible by replacing 's' with ' '. Technically this does add an unnecessary escape character but, again, that does not break the intended use of the method.

CodePudding user response:

This happens to any searchstring containing a space. Indeed, if I do a

p(Regexp.escape(searchstring))

for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?

This looks to be a bug. In my opinion, whitespace is not a Regexp meta character, there is no need to escape it.

Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string […]

If you want to do literal string replacement, then don't use a Regexp. Just use a literal string:

line.gsub!(from, to)
  •  Tags:  
  • ruby
  • Related