Home > Enterprise >  I just want some clarification on a Java Regex problem for removing leading zeros in an IP address
I just want some clarification on a Java Regex problem for removing leading zeros in an IP address

Time:08-27

I'm working to familiarize myself with regex in Java, and it's a bit of a learning curve for me.

So the solution to the problem is as follows:

public static String validate (String ip){
        return ip.replaceAll("(?<=^|\\.)0 (?!\\.|$)", "");
    }

I just need clarification on why this regex solution works.

I get that the "?" specifies zero or one instances, the "^" represents the beginning of a string, the "\." is the escape character for a period, the "$" represents the end of the line, and the "" at the end represents deletion of a character, but I don't understand the totality of the regex. If someone could just quick walk me through what it all means together, that would be greatly appreciated. Thank you!

CodePudding user response:

In my opinion, this regex is a tricky one, because it uses a pattern not very common : lookahead and lookbehind patterns.

I get that the "?" specifies zero or one instances

That is true, but only when it follows a character to search for.

Here, it follows no character, as it is the first character of a group opening. It means it is a special construct. There's a dedicated javadoc subsection for it (see [1]).

In your examples, we can find two different constructs:

  1. (?<=^|\\.)
    • This is cited in [1] as (?<=X) X, via zero-width positive lookbehind
    • What is this "positive lookbehind" stuff ? Source [2] defines it as Asserts that what immediately precedes the current position in the string is X
    • In your case, we ask to verify that 0 is either (|) the first character of the input text (^), or is just after a .
  2. (?!\\.|$)
    • [1] define it as (?!X) X, via zero-width negative lookahead
    • [2] explains: Asserts that what immediately follows the current position in the string is not X
    • In your context, it ensures that we don't match trailing zero of a number (either the last zero or a zero that is just before a dot).

Sources:

[1] Oracle Pattern javadoc, Special constructs

[2] Regex tutorial : look around

  • Related