Home > Software design >  Java Regex missing a match in the output
Java Regex missing a match in the output

Time:03-31

I am currently matching a string against a regular expression. My pattern is:

"(?<=\p{Alnum}|\p{Punct})(\p{Alnum} \p{Punct}{1})"

I am matching it with the string:

"https://www.google.com/"

My desired result with the above regex and string is:

https:, www., google., com/

I am able to get all the matches successfully except 'https:' one. In that case it is giving out 'ttps:' instead of the required 'https:'

I am not able to understand where I went wrong. Can anyone please help me in figuring this out?

CodePudding user response:

You can use

(?<![^\p{Alnum}\p{Punct}])(\p{Alnum} \p{Punct})

See the online regex demo.

The (?<![^\p{Alnum}\p{Punct}]) negative lookbehind matches a location that is not immediately preceded by a char other than an alphanumeric and a punctuation char.

Note that your regex required an alphanumeric or punctuation char immediately on the left, so it was impossible to match the start of string position.

Note that {1} is always redundant, you can see more about regex redundancy in the "Writing cleaner regular expressions" YT video of mine.

  • Related