I am currently matching a string against a regular expression. My pattern is:
"(?<=\p{Alnum}|\p{Punct})(\p{Alnum} \p{Punct}{1})"
I am matching it with the string:
"https://www.google.com/"
My desired result with the above regex and string is:
https:, www., google., com/
I am able to get all the matches successfully except 'https:' one. In that case it is giving out 'ttps:' instead of the required 'https:'
I am not able to understand where I went wrong. Can anyone please help me in figuring this out?
CodePudding user response:
You can use
(?<![^\p{Alnum}\p{Punct}])(\p{Alnum} \p{Punct})
See the online regex demo.
The (?<![^\p{Alnum}\p{Punct}])
negative lookbehind matches a location that is not immediately preceded by a char other than an alphanumeric and a punctuation char.
Note that your regex required an alphanumeric or punctuation char immediately on the left, so it was impossible to match the start of string position.
Note that {1}
is always redundant, you can see more about regex redundancy in the "Writing cleaner regular expressions" YT video of mine.