Home > Enterprise >  Regex to capture comma separated groups of text in parentheses [Java]
Regex to capture comma separated groups of text in parentheses [Java]

Time:08-24

I have a string that contains one or more (comma-separated) values, surrounded by quotes and enclosed in parentheses. So it can be of the type os IN ('WIN', 'MAC', 'LNU') (for multiple values) or just os IN ('WIN') for a single value.

I need to extract the values in a List.

I have tried this regex, but it captures all the values into one single list element as one whole String as 'WIN', 'MAC', instead of two String values of WIN and MAC -

        List<String> matchList = new ArrayList<>();

        Pattern regex = Pattern.compile("\\((. ?)\\)");
        Matcher regexMatcher = regex.matcher(processedFilterString);

        while (regexMatcher.find()) {//Finds Matching Pattern in String
            matchList.add(regexMatcher.group(1));//Fetching Group from String
        }

Result:

Input: os IN ('WIN', 'MAC')
Output:
['WIN', 'MAC']
length: 1

In it's current form, the regex matches one or more characters surrounded by parentheses and captures them in a group, which is probably why the result is just one string. How can I adapt it to capture each of the values separately?

Edit - Just adding some more details. The input string can have multiple IN clauses containing other criteria, such as id IN ('xxxxxx') AND os IN ('WIN', 'MAC'). Also, the length of the matched characters is not necessarily the same, so it could be - os IN ('WIN', 'MAC', 'LNUX').

CodePudding user response:

You may try splitting the CSV string from the IN clause:

List<String> matchList = null;

Pattern regex = Pattern.compile("\\((. ?)\\)");
Matcher regexMatcher = regex.matcher(processedFilterString);

if (regexMatcher.find()) {
    String match = regexMatcher.group(1).replaceAll("^'|'$", "");
    String[] terms = match.split("'\\s*,\\s*'");
    matchList = Arrays.stream(terms).collect(Collectors.toList());
}

Note that if your input string could contain multiple IN clauses, then the above would need to be modified to use a while loop.

CodePudding user response:

What I see from the examples in your question, your regular expression needs to find strings of at least three upper-case letters enclosed in single quotes.

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Solution {

    public static void main(String[] args) {
        String s = "os IN ('WIN', 'MAC', 'LNUX')";
        Pattern pattern = Pattern.compile("'([A-Z]{3,})'");
        Matcher matcher = pattern.matcher(s);
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group(1));
        }
        System.out.println(list);
    }
}

Running the above code produces the following output:

[WIN, MAC, LNUX]
  • Related