In Java (JDK 11), consider the following string:
String hello = "333 444 5qwerty5 006 -7";
I am trying to come up with a RegEx that will split anything that isn't a digit, whilst keeping the separators except space. So in the above example, I would like to end up with the following array:
["333" , " " , "444" , "5" , "q" , "w" , "e" , "r" , "t" , "y" , "5" , "006" , "-7"]
Do note the leading zeroes in 006, and -7. The code I am using is the following:
String[] splited = s.split("((?<=[^0-9] )|(?=[^0-9] )|(\\s ))");
However, I can see that my array is keeping spaces. I can't for the life of me figure my mistake. Any thoughts?
EDIT: Turns out the requirement is even more complex than I thought:
["333 444" , "5" , "q" , "w" , "e" , "r" , "t" , "y" , "5" , "006" , "-7"]
So if there is no space between an integer and operators - * / % ^
, then do not split them. I have issues implementing this rule along with the fact that leading zeroes and negative numbers should not be split.
CodePudding user response:
Instead of using split, you could also match all the parts:
-?\d |\S
The pattern matches:
-?
Optionally match a hyphen\d
Match 1 digits|
Or\S
Match a single non whitespace char
See a regex demo and a Java demo.
Example
String regex = "-?\\d |\\S";
String string = "333 444 5qwerty5 006 -7";
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile(regex).matcher(string);
while (m.find()) {
allMatches.add(m.group());
}
System.out.println(Arrays.toString(allMatches.toArray()));
Output
[333, , 444, 5, q, w, e, r, t, y, 5, 006, -7]
CodePudding user response:
This works for your example:
String[] split = hello.split("(?<=\\d)(?=\\D) *|(?<=[^\\d -])(?=[\\d-])|(?<=[\\d-])(?=[^\\d -])|(?<=[^\\d -])(?=[^\\d -])");
The important parts are:
- Using
[\\d-]
instead of\d
so minus signs are treated as "digits" - Generally using
[^\d -]
instead of\D
to prevent empty split elements at word ends - Splitting after digits, but only if a non-digit follows
- Adding
*
to capture ("delete") spaces when splitting - Splitting between non-digits
Test code:
String hello = "333 444 5qwerty5 006 -7";
String[] split = hello.split("(?<=\\d)(?=\\D) *|(?<=[^\\d -])(?=[\\d-])|(?<=[\\d-])(?=[^\\d -])|(?<=[^\\d -])(?=[^\\d -])");
System.out.println(Arrays.toString(split));
Output:
[333, , 444, 5, q, w, e, r, t, y, 5, 006, -7]