I have this code to remove URIs using these schemes:
htmldoc.gsub(/#{URI::regexp(['http', 'https', 'ftp', 'mailto'])}/, '')
However, it won't detect a capitalized URI like HTTP
or Http
unless I add them to the array.
I tried adding the case-insensitive flag i
to the regex, but it didn't work.
Any idea how I could achieve this?
CodePudding user response:
URI::regexp
calls the default parser's make_regexp
which in turn passes the given arguments to Regexp::union
and according to its docs: (emphasis mine)
The patterns can be Regexp objects, in which case their options will be preserved, or Strings.
Applied to your problem:
pattern = URI::regexp([/http/i, /https/i, /ftp/i, /mailto/i])
htmldoc = <<-HTML
<html>
<body>
<a href="https://example.com">here</a>
<a href="HTTPS://example.com">here</a>
</body>
</html>
HTML
puts htmldoc.gsub(pattern, '')
Output:
<html>
<body>
<a href="">here</a>
<a href="">here</a>
</body>
</html>