Home > database >  Use regex to get 2 specific groups of substring
Use regex to get 2 specific groups of substring

Time:08-03

String s = #Section250342,Main,First/HS/12345/Jack/M,2000 10.00,
#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,
#Section251234,Main,First/HS/12345/Jack/M,2000 11.00

Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234) and the values(10.00,11.00) associated with it using regex each time.

I tried something like this https://regex101.com/r/4te0Lg/1 but it is still messed.

.Section(\d (?:\.\d )?).*/Jack/M

CodePudding user response:

You could use 2 capture groups, and use a tempered greedy token approach to not cross @Section followed by a digit.

#Section(\d )(?:(?!#Section\d).)*\bJack/M,\d \h (\d (?:\.\d )?)\b

Explanation

  • #Section(\d ) Match #Section and capture 1 digits in group 1
  • (?:(?!#Section\d).)* Match any character if not directly followed by #Section and a digit
  • \bJack/M, Match the word Jack and /M,
  • \d \h Match 1 digits and 1 spaces
  • (\d (?:\.\d )?) Capture group 2, match 1 digits and an optional decimal part
  • \b A word boundary

Regex demo

In Java:

String regex = "#Section(\\d )(?:(?!#Section\\d).)*\\bJack/M,\\d \\h (\\d (?:\\.\\d )?)\\b";

CodePudding user response:

If the only parts of each section that change are the section number, the name of the person and the last value (like in your example) then you can make a pattern very easily by using one of the sections where Jack appears and replacing the numbers you want by capturing groups.

Example:

#Section250342,Main,First/HS/12345/Jack/M,2000 10.00

becomes,

#Section(\d ),Main,First/HS/12345/Jack/M,2000 (\d .\d{2})

If the section substring keeps the format but the other parts of it may change then just replace the rest like this:

#Section(\d ),\w ,(?:\w /)*Jack/M,\d  (\d .\d{2})

I'm assuming that "Main" is a class, "First/HS/..." is a path and that the last value always has 2 and only 2 decimal places.

  • \d - A digit: [0-9]
  • \w - A word character: [a-zA-Z_0-9]
  • - one or more times
  • * - zero or more times
  • {2} - exactly 2 times
  • () - a capturing group
  • (?:) - a non-capturing group

For reference see: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html

Simple Java example on how to get the values from the capturing groups using java.util.regex.Pattern and java.util.regex.Matcher

import java.util.regex.*;

public class GetMatch {

    public static void main(String[] args) {

        String s = "#Section250342,Main,First/HS/12345/Jack/M,2000 10.00,#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,#Section251234,Main,First/HS/12345/Jack/M,2000 11.00";
        
        Pattern p = Pattern.compile("#Section(\\d ),\\w ,(?:\\w /)*Jack/M,\\d  (\\d .\\d{2})");
        Matcher m;
        String[] tokens = s.split(",(?=#)"); //split the sections into different strings
        
        for(String t : tokens) //checks every string that we got with the split
        {   
            m = p.matcher(t);
            if(m.matches()) //if the string matches the pattern then print the capturing groups
                System.out.printf("Section: %s, Value: %s\n", m.group(1), m.group(2));
        }
    }
}
  • Related