Home > Blockchain >  Java regex split string with dashes and space
Java regex split string with dashes and space

Time:11-13

I have a string like:

String t = "this is my--test string";

I need to split it by space and -- So i tried:

String[] m = t.split("[\\s -]");

It returns

["this", "is", "my", "", "test", "string"]

but in reality i needed

["this", "is", "my", "--", "test", "string"]

What am I missing? is this possible?

CodePudding user response:

You can use

String[] result = t.split("\\s |(?<=--)(?!--)|(?<!--)(?=--)")

See the regex demo. Details:

  • \s - one or more whitespaces
  • | - or
  • (?<=--)(?!--) - a location immediately preceded with -- and not immediately followed with --
  • | - or
  • (?<!--)(?=--) - a location not immediately preceded with -- and immediately followed with --.

See the Java demo:

String regex = "\\s |(?<=--)(?!--)|(?<!--)(?=--)";
String string = "this is my--test string";
System.out.println(Arrays.toString(string.split(regex)));
// => [this, is, my, --, test, string]

CodePudding user response:

I wasn't able to do it with just regex but does this work?

String t = "this is my--test string";
t = t.replace( "--", " -- " );
String[] m = t.split(" ");

CodePudding user response:

You can use a regex matcher to do this:

public static void main (String[] args) {
    final String t = "this is my--test string";
    
    final String regex = "(- )|(\\w )";
    final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
    final Matcher matcher = pattern.matcher(t);

    List<String> split = new ArrayList<>();
    while (matcher.find()) {
        split.add(matcher.group(0));
    }

    System.out.println(split);
}

The regex explanation:

  • (- ): matches the character "-" one or multiple times;
  • |: equivalent to boolean OR;
  • (\\w ): matches any word character (equivalent to [a-zA-Z0-9_]) one or multiple times.

CodePudding user response:

This should give you what you want and is slightly simpler than the earlier answer (which also works).

String[] m = t.split("\\s |\\-\\-");

  • Related