Home > database >  How do I implement lookahead and non lookahead in the same regex statement?
How do I implement lookahead and non lookahead in the same regex statement?

Time:06-09

How do I implement both a lookahead (without replacement), and a non-lookahead in the same regex statement?

I want to split up a sentence such as:

"ad1 cow run sick ag2 4 8 6 9 crap2 ag lag pag arg2 8 6 5"

into

ad1 cow run sick
ag2 4 8 6 9
crap2 ag lag pag
arg2 8 6 5

Here is the statement that almost gets me there with a lookahead:

"(?=\\s\\w\\w*\\d)"

That is, it looks for a space, a character in the string, any number of characters following that, and then it is followed by a digit. Here Is what I get with that:

ad1 cow run sick
 ag2 4 8 6 9
 crap2 ag lag pag
 arg2 8 6 5

Notice the spaces there still since I had a lookahead. How do I remove those spaces as well in the same regex statement?

CodePudding user response:

You can move the whitespace matching pattern out of the lookahead:

"\\s (?=\\w \\d)"

This way, the whitespaces will get consumed and thus removed during splitting.

Details

  • \s - one or more whitespaces
  • (?=\w \d) - a positive lookahead that matches a location that is immediately followed with one or more word chars and then a digit.

See the regex demo.

CodePudding user response:

You can also use your pattern as a match (note that \\w\\w* can be written as\\w

\\w \\d.*?(?=\\s\\w \\d|$)

Explanation

  • \\w \\d Match 1 word chars and a digit
  • .*? Match as least as possible characters
  • (?= Positive lookeahd
    • \\s\\w \\d match a whitespace char, 1 word chars and a digit
    • | Or
    • $ Assert the end of the string
  • ) Close lookahead

Regex demo

  • Related