Home > Back-end >  How do I remove a common substring using Ruby?
How do I remove a common substring using Ruby?

Time:10-01

I have read How do I remove substring after a certain character in a string using Ruby?. This is close, but different.

I have these emails with a mask:

email1 = '[email protected]'
email2 = '[email protected]'
email3 = '[email protected]'

I want to remove the substrings that are after .br, .com and .net. The return must be:

email1 = '[email protected]'
email2 = '[email protected]'
email3 = '[email protected]'

CodePudding user response:

You can do that with the method String#[] with an argument that is a regular expression.

r = /.*?\.(?:rb|com|net|br)(?!\.br)/

'[email protected]'[r]
  #=> "giovanna.macedo@lojas100.com.br"
'[email protected]'[r]
  #=> "alvaro-neves@stockshop.com"
'[email protected]'[r]
  #=> "filiallojas123@filiallojas.net"

The regular expression reads as follows: "Match zero or more characters non-greedily (?), follow by a period, followed by 'rb' or 'com' or 'net' or 'br', which is not followed by .br. (?!\.br) is a negative lookahead.

Alternatively the regular expression can be written in free-spacing mode to make it self-documenting:

r = /
    .*?      # match zero or more characters non-greedily
    \.       # match '.'
    (?:      # begin a non-capture group
      rb     # match 'rb'
      |      # or
      com    # match 'com' 
      |      # or
      net    # match 'net'
      |      # or
      br     # match 'br'
    )        # end non-capture group
    (?!      # begin a negative lookahead
      \.br   # match '.br'
    )        # end negative lookahead
    /x       # invoke free-spacing regex definition mode

CodePudding user response:

This should work for your scenario:

expr = /^(. \.(?:br|com|net))-[^'] (')$/
str = "email = '[email protected]'"
str.gsub(expr, '\1\2')

CodePudding user response:

Use the String#delete_suffix Method

This was tested with Ruby 3.0.2. Your mileage may vary with other versions that don't support String#delete_suffix or its related bang method. Since you're trying to remove the exact same suffix from all your emails, you can simply invoke #delete_suffix! on each of your strings. For example:

common_suffix = "-215000695716b.ct.domain.com.br".freeze
emails = [email1, email2, email3]
emails.each { _1.delete_suffix! common_suffix }

You can then validate your results with:

emails
#=> ["giovanna.macedo@lojas100.com.br", "alvaro-neves@stockshop.com", "filiallojas123@filiallojas.net"]

email1
#=> "giovanna.macedo@lojas100.com.br"

email2
#=> "alvaro-neves@stockshop.com"

email3
#=> "filiallojas123@filiallojas.net"

You can see that the array has replaced each value, or you can call each of the array's variables individually if you want to check that the strings have actually been modified in place.

String Methods are Usually Faster, But Your Mileage May Vary

Since you're dealing with String objects instead of regular expressions, this solution is likely to be faster at scale, although I didn't bother to benchmark all solutions to compare. If you care about performance, you can measure larger samples using IRB's new measure command, it took only 0.000062s to process the strings this way on my system, and String methods generally work faster than regular expressions at large scales. You'll need to do more extensive benchmarking if performance is a core concern, though.

Making the Call Shorter

You can even make the call shorter if you want. I left it a bit verbose above so you could see what the intent was at each step, but you can trim this to a single one-liner with the following block:

# one method chain, just wrapped to prevent scrolling
[email1, email2, email3].
  map { _1.delete_suffix! "-215000695716b.ct.domain.com.br" }

Caveats

You Need Fixed-String Suffixes

The main caveat here is that this solution will only work when you know the suffix (or set of suffixes) you want to remove. If you can't rely on the suffixes to be fixed, then you'll likely need to pursue a regex solution in one way or another, even if it's just to collect a set of suffixes.

Dealing with Frozen Strings

Another caveat is that if you've created your code with frozen string literals, you'll need to adjust your code to avoid attempting in-place changes to frozen strings. There's more than one way to do this, but a simple destructuring assignment is probably the easiest to follow given your small code sample. Consider the following:

# assume that the strings in email1 etc. are frozen, but the array
# itself is not; you can't change the strings in-place, but you can
# re-assign new strings to the same variables or the same array
emails = [email1, email2, email3]
email1, email2, email3 =
  emails.map { _1.delete_suffix "-215000695716b.ct.domain.com.br" }

There are certainly other ways to work around frozen strings, but the point is that while the now-common use of the # frozen_string_literal: true magic comment can improve VM performance or memory usage in large programs, it isn't always the best option for string-mangling code. Just keep that in mind, as tools like RuboCop love to enforce frozen strings, and not everyone stops to consider the consequences of such generic advice to the given problem domain.

  • Related