Home > database >  Regex for parsing a simple sentence words delimited by double quotes
Regex for parsing a simple sentence words delimited by double quotes

Time:10-31

I have an example sentence that looks like this:

""Music"",""EDM / Electronic"",""organizer: Tiny Toons""

I want to parse this sentence into the tokens:

["Music", "EDM / Electronic", "organizer: Tiny Toons"]

My regex foo is quite limited, and I'm under some time pressure.

Was wondering if someone could help me construct a regex (compatible with Java8 - as I'm using Clojure to apply the regex) to parse out these capture groups.

Thank you, Jason.

CodePudding user response:

Assuming the sentence is the entire string and that there are no commas or " to be matched, you could just use

"[^,\"] "

If the above assumptions are not correct, please give examples of possible input strings and details of what characters can appear within the sections you want to match.

A simple java example of how to use the regex:

String sentence = "\"\"Music\"\",\"\"EDM / Electronic\"\",\"\"organizer: Tiny Toons\"\"";
Matcher matcher = Pattern.compile("[^,\"] ").matcher(sentence);
List<String> matches = new ArrayList<String>();
while (matcher.find()) {
    matches.add(matcher.group());
}
System.out.println(matches);
  • Related