Home > other >  Capturing Arbitrary Multiple Groups with Regex
Capturing Arbitrary Multiple Groups with Regex

Time:12-08

I am trying to use regular expressions in Java to capture data from the following string:

SettingName = "Value1",0x2,3,"Value4 contains spaces", "Value5 has a space before the string that is ignored"

The string may be prepended with an arbitrary amount of whitespace, which can be ignored. It also may contain more or fewer values than listed here, this is just an example.

My intent is to capture these groups:

  • Group 1: SettingName
  • Group 2: "Value1" (with the quotes)
  • Group 3: 0x2
  • Group 4: 3
  • Group 5: "Value4 contains spaces"
  • Group 6: "Value5 has a space before the string that is ignored"

  • The regex I am trying to use is:

    \s*([\w\/.-] )\s*=(?:\s*(\"?[^\",]*\"?)(?:,|\s*$)) 
    \s* -> Consume an arbitrary number of whitespace
       ( -> Start a capturing group (group 1)
        [\w\/.-] -> Get a letter of the SettingName, which may be contain alphanumberic, /, ., and -
                  -> Get the previous token one or more times (so group 1 is not blank)
                 ) -> End the capturing group
                  \s* -> Consume an arbitrary amount of whitespace
                     = -> Consume the equals sign
                      (?: -> Start an uncaptured group
                         \s* -> Consume an arbitrary amount of whitespace
                            ( -> Start a captured group
                             \"? -> Consume a quote, if it exists
                                [^\",] -> Consume any nonquote, noncomma character
                                      \"? -> Consume the end quote, if it exists
                                         ) -> End the captured group
                                          (?: -> start a uncaptured group
                                             ,|\s*$ -> capture either a comma or end of line (string?) character
                                                   ) -> end the uncaptured group
                                                    ) -> end the outer uncaptured group
                                                       -> match the outer uncaptured group 1 or more times
    

    I am using this code:

    private static final String regex = "\\s*([\\w\\/.-] )\\s*=(?:\\s*(\"?[^\",]*\"?)(?:,|\\s*$)) ";
    private static final Pattern settingPat = Pattern.compile(regex);
    ...
    public String text;
    public Matcher m;
    ...
    public void someMethod(String lineContents)
    {
        m = settingPat.matcher(text);
        if(!m.matches())
            ... (do other stuff)
        else
        {
            name = m.group(1);     // should be "SettingName"
            value[0] = m.group(2); // should be "\"Value1\""
            value[1] = m.group(3); // should be "0x2"
            ...
        }
    }
    

    With this code, it matches to the String, but it seems that I am only capturing the last group. Does Java and/or regular expressions support arbitrary capturing groups repeatedly with a modifier?

    CodePudding user response:

    You have only 2 capture groups so you cannot get more that 2 groups in the result. You will have to run a loop to match all the repetitions

    You may use this regex in while loop to get all matches:

    (?:([\w/.-] )\h*=|(?!^)\G,)\h*((\"?)[^\",]*\3)
    

    \G asserts position at the end of the previous match or the start of the string for the first match, since we are using (?!^) we force \G to only match position at the end of the previous match

    RegEx Demo

    CODE DEMO

    Code:

    final String regex = "(?:([\\w/.-] )\\h*=|(?!^)\\G,)\\h*((\"?)[^\",]*\\3)";
    final String string = "SettingName = \"Value1\",0x2,3,\"Value4 contains spaces\", \"Value5 has a space before the string that is ignored\"";
        
    final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
    final Matcher matcher = pattern.matcher(string);
        
    while (matcher.find()) {
        if (matcher.group(1) != null)
            System.out.println(matcher.group(1));
        System.out.println("\t=> "   matcher.group(2));
    }
    
    • Related