How do I implement both a lookahead (without replacement), and a non-lookahead in the same regex statement?
I want to split up a sentence such as:
"ad1 cow run sick ag2 4 8 6 9 crap2 ag lag pag arg2 8 6 5"
into
ad1 cow run sick
ag2 4 8 6 9
crap2 ag lag pag
arg2 8 6 5
Here is the statement that almost gets me there with a lookahead:
"(?=\\s\\w\\w*\\d)"
That is, it looks for a space, a character in the string, any number of characters following that, and then it is followed by a digit. Here Is what I get with that:
ad1 cow run sick
ag2 4 8 6 9
crap2 ag lag pag
arg2 8 6 5
Notice the spaces there still since I had a lookahead. How do I remove those spaces as well in the same regex statement?
CodePudding user response:
You can move the whitespace matching pattern out of the lookahead:
"\\s (?=\\w \\d)"
This way, the whitespaces will get consumed and thus removed during splitting.
Details
\s
- one or more whitespaces(?=\w \d)
- a positive lookahead that matches a location that is immediately followed with one or more word chars and then a digit.
See the regex demo.
CodePudding user response:
You can also use your pattern as a match (note that \\w\\w*
can be written as\\w
\\w \\d.*?(?=\\s\\w \\d|$)
Explanation
\\w \\d
Match 1 word chars and a digit.*?
Match as least as possible characters(?=
Positive lookeahd\\s\\w \\d
match a whitespace char, 1 word chars and a digit|
Or$
Assert the end of the string
)
Close lookahead