Home > Back-end >  Using regex to select 3 groups from a string
Using regex to select 3 groups from a string

Time:11-18

String s = #Section250342,Main,First/HS/12345/Jack/M,200010 10.00 200011 -2.00,
#Section250322,Main,First/HS/12345/Aaron/N,200010 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,200010 12.00,
#Section251234,Main,First/HS/12345/Jack/M,200011 11.00

Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234),dates (200010,200011) and the values(10.00,11.00,-2.00) associated with it using regex each time. Sometines a single line can contain either one value or two so that what makes the regex sort of confusing. So at the end of day, there will be 3 diff groups we want to extract.

I tried

#Section(\d )(?:(?!#Section\d).)*\bJack/M,(\d )\h (\d (?:\.\d )?)\s(\d )\h ([- ]?\d (?:\.\d )?)\b

See it in action here - https://regex101.com/r/JaKeGg/1, it brings in 5 groups instead of 3 and when there is only one value here it doesn't seem to match so I need help with this.

CodePudding user response:

You might use a pattern to get 2 capture groups, and then after process the capture 2 values to combine the numbers that should be grouped together.

As the dates and the values in the examples strings seem to go by pair, you can split the group 2 values from the regex on a space and create 2 groups using the modulo operator to group the even/odd occurrences.

#Section(\d )\b(?:(?!#Section\d).)*\bJack/M,(\d \h [- ]?\d (?:\.\d )?(?:\s \d \h [- ]?\d (?:\.\d )?)*)

Regex demo | Java demo

String regex = "#Section(\\d )\\b(?:(?!#Section\\d).)*\\bJack/M,(\\d \\h [- ]?\\d (?:\\.\\d )?(?:\\s \\d \\h [- ]?\\d (?:\\.\\d )?)*)";
String string = "#Section250342,Main,First/HS/12345/Jack/M,200010 10.00 200011 -2.00,\n"
          "#Section250322,Main,First/HS/12345/Aaron/N,200010 17.00,\n"
          "#Section250399,Main,First/HS/12345/Jimmy/N,200010 12.00,\n"
          "#Section251234,Main,First/HS/12345/Jack/M,200011 11.00";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);


while (matcher.find()) {
    List<String> group2 = new ArrayList<>();
    List<String> group3 = new ArrayList<>();

    System.out.println("Group 1: "   matcher.group(1));
    String[] parts = matcher.group(2).split("\\s ");
    for (int i = 0; i < parts.length; i  ) {
        if (i % 2 == 0) {
            group2.add(parts[i]);
        } else {
            group3.add(parts[i]);
        }
    }
    System.out.println("Group 2: "   Arrays.toString(group2.toArray()));
    System.out.println("Group 3: "   Arrays.toString(group3.toArray()));
}

}

Output

Group 1: 250342
Group 2: [200010, 200011]
Group 3: [10.00, -2.00]
Group 1: 251234
Group 2: [200011]
Group 3: [11.00]

If you want to group all values, you can create 3 lists and print all the 3 lists after the looping.

List<String> group1 = new ArrayList<>();
List<String> group2 = new ArrayList<>();
List<String> group3 = new ArrayList<>();

while (matcher.find()) {
    group1.add(matcher.group(1));
    String[] parts = matcher.group(2).split("\\s ");
    for (int i = 0; i < parts.length; i  ) {
        if (i % 2 == 0) {
            group2.add(parts[i]);
        } else {
            group3.add(parts[i]);
        }
    }
}
System.out.println("Group 1: "   Arrays.toString(group1.toArray()));
System.out.println("Group 2: "   Arrays.toString(group2.toArray()));
System.out.println("Group 3: "   Arrays.toString(group3.toArray()));

Output

Group 1: [250342, 251234]
Group 2: [200010, 200011, 200011]
Group 3: [10.00, -2.00, 11.00]

See this Java demo

CodePudding user response:

I think it is quite difficult to accomplish what you want using solely regex. According to another SO question you can't have multiple matches for the same capturing group in your regex. Instead only the last matching pattern will actually be captured.

My suggestion is to split your string by line in java, iterate through the lines, check if a line contains the substring you search for "Jack/M", and then use regex to extract the different bits by searching for simpler regex pattern instead of trying to match one long regex to the whole string.

A good walk through on how to find matches for a regex in a string: https://www.tutorialspoint.com/getting-the-list-of-all-the-matches-java-regular-expressions

  • Related