Home > Enterprise >  Splitting String by column with regex
Splitting String by column with regex

Time:06-13

4 
1 1
1 2 1
0
1 1

This is a String I get as input, but I just need every column, starting with the second column, aka:

  • 1 (second row)
  • 2 and 1 (third row)
  • 1 (fifth row)

This String has no fixed size in how many lines there could be or how many columns (columns being split by one space).

I think this is fairly easy by using:

string.split("enter regex here");

I need every column after the first. I'm still learning with regex but I just can't seem to find a good solution. I know about "\\r?\\n" and " " for splitting but don't know how to connect both to get every column. Any help is very appreciated :)

Another String could look like this:

2
1
1 2
9 3 5
1 3
0 9 2 4
0

In that case, I would need 2, 3, 5, 3, 9, 2, 4.

CodePudding user response:

First trim leading column, then split on white space:

String[] split = str.replaceAll("(?m)^\\d \\s*", "").split("\\s");

See live demo.

The replace uses the multiline flag (?m), which makes ^ match the start of every line, and \s matches spaces, so the first column is effectively deleted from every line, but \s also matches newlines, so lines with only one column are deleted entirely. Although the new lines are retained in lines with more than 1 column.

Because \s matches space and newline, the split splits between columns and between (first column removed) lines, yielding the desired result.

I believe this is the least code required for a solution.

CodePudding user response:

You can use the following regex:

(?<=\d )\d 

It matches any combination of digits, followed by "digit space".

Instead of splitting on this, you should use the matcher with this regex.

Check the demo here.

CodePudding user response:

You can split each line using String.lines to get a stream of the lines and then flatmap those lines after spliting at each space using Pattern.splitAsStream and skip the first column and join back together using comma as a delimeter:

String input ="4 \n"
              "1 1\n"
              "1 2 1\n"
              "0\n"
              "1 1\n";

Pattern pattern = Pattern.compile(" ");
String result   = input.lines()
                       .flatMap(line -> pattern.splitAsStream(line).skip(1))
                       .collect(Collectors.joining(", "));

System.out.println(result);

//1, 2, 1, 1

CodePudding user response:

String s = "4 \n"
          "1 1\n"
          "1 2 1\n"
          "0\n"
          "1 1\n";
String result = s.replaceAll("((^|\\n)\\d|[ ])", "").replaceAll("(\\d)(?=\\d)", "$1, ");
System.out.println(result); 
//1, 2, 1, 1

CodePudding user response:

You could use the following regex which first captures a number followed by a space and then captures any sequence of numbers followed either by a space or nothing. The second capturing group represents the rest of the String you're interested.

(\d ) ((\d ( |)) )

Here is an implementation:

String str = "4 \n"  
        "1 1\n"  
        "1 2 1\n"  
        "0\n"  
        "1 1";

Pattern pattern = Pattern.compile("(\\d ) ((\\d ( |)) )");
Matcher matcher = pattern.matcher(str);

while(matcher.find()){
    System.out.println(matcher.group(2));
}

Here is a link to test the code above for both inputs:

https://www.jdoodle.com/iembed/v0/s92

Output

1
2 1
1

2
3 5
3
9 2 4

Here is also a link to test the regex:

https://regex101.com/r/z1plcG/1

  • Related