Home > Software engineering >  Target case-insensitive schemes with URI::regexp?
Target case-insensitive schemes with URI::regexp?

Time:05-25

I have this code to remove URIs using these schemes:

htmldoc.gsub(/#{URI::regexp(['http', 'https', 'ftp', 'mailto'])}/, '')

However, it won't detect a capitalized URI like HTTP or Http unless I add them to the array.

I tried adding the case-insensitive flag i to the regex, but it didn't work.

Any idea how I could achieve this?

CodePudding user response:

URI::regexp calls the default parser's make_regexp which in turn passes the given arguments to Regexp::union and according to its docs: (emphasis mine)

The patterns can be Regexp objects, in which case their options will be preserved, or Strings.

Applied to your problem:

pattern = URI::regexp([/http/i, /https/i, /ftp/i, /mailto/i])

htmldoc = <<-HTML
<html>
<body>
  <a href="https://example.com">here</a>
  <a href="HTTPS://example.com">here</a>
</body>
</html>
HTML

puts htmldoc.gsub(pattern, '')

Output:

<html>
<body>
  <a href="">here</a>
  <a href="">here</a>
</body>
</html>
  •  Tags:  
  • ruby
  • Related