Home > Software engineering >  Regex JS- find two groups in a string using regex
Regex JS- find two groups in a string using regex

Time:03-21

I am trying to understand how can I get 2 captured groups with regex(JS), from the following string:

"Group: i_am_group |SubGroup: i_am_sub_group"

I want to get in the end: group1: i_am_group and group2: i_am_sub_group

the rules are-

Extract the first word after "Group: " into group1
Extract the first word after "SubGroup: " into group2

I need to implement those two rules with regex so I can run it with match() function in javaScript

I was trying to do the following:

(?<=Group:\s)(\w ) ((?<=|SubGroup:\s)(\w*))

and the result was:

results

Thanks in advance.

CodePudding user response:

| has special meaning in regular expressions, it's used to specify alternatives. You need to escape it to match it literally.

There's no need to use lookbehinds when you're capturing the part after that. The purpose of lookarounds is to keep them out of the matched string, but if you're only interested in the capture groups this is irrelevant.

This regexp should work for you:

Group:\s(\w ) \|SubGroup:\s(\w*)

DEMO

CodePudding user response:

If by "word" you're happy with the definition of \w (which is [A-Za-z0-9_]; more below), you can do it like this:

const rex = /Group:\s*(\w ).*?SubGroup:\s*(\w )/;

Add the i flag if you want to allow Group and SubGroup to be in lower case.

That looks for Group:, allows for optional whitespace after it, then captures all "word" characters that follow that; then it looks for optional anything followed by SubGroup:, optional whitespace, and then captures all "word" chars after that.

Live Example:

const str = "Group: i_am_group |SubGroup: i_am_sub_group";
const rex = /Group:\s*(\w ).*?SubGroup:\s*(\w )/;
console.log(str.match(rex));

If you want a different definition for "word" character than \w, use [something_here] instead of \w , where within the [ and ] you list the characters / character ranges you want to consider "word" characters.

For instance, in English, we usually don't consider _ as part of a word (though your examples use it, so I'll leave it in), but we often consider - to be part of a word. We also frequently allow letters borrowed from other languages like é and ñ, so you might want those in the character class. You might go further and (in ES2015 environments) use Unicode's definition of a "letter", which is written \p{Letter} (and requires the u flag on the expression):

const rex = /Group:\s*([-\p{Letter}0-9_] ).*?SubGroup:\s*([-\p{Letter}0-9_] )/u;

(The - at the very beginning is treated literally, not as an indicator of a range.)

Live Example:

const str = "Group: i_am_group |SubGroup: i_am_sub_group";
const rex = /Group:\s*([-\p{Letter}0-9_] ).*?SubGroup:\s*([-\p{Letter}0-9_] )/u;
console.log(str.match(rex));

  • Related