So for example inside of the string:
=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i=
How can I make it so that it matches only the substrings that are
confined in between either "=" or "/" on both sides and not "=" on one side and "/" on the other. I want to extract only the matches that start with a capital letter and have 3 or more letters in total between the =
or /
.
I tried (=|/) on the left and right of the group that catches the substring in a group within but that matches the cases when it's = on one side and / on the other.
Keep in mind that I'm still learning regex and I don't know how to make it strictly match on both sides.
CodePudding user response:
You can use
/(?<=([\/=]))[A-Z][a-z]{2,}(?=\1)/g
/([\/=])([A-Z][a-z]{2,})\1/g
See the regex #1 demo and regex #2 demo. Details:
(?<=([\/=]))
- a positive lookbehind that matches a location that is immediately preceded with/
or=
(captured into Group 1)[A-Z]
- an uppercase letter[a-z]{2,}
- two or more lowercase letters(?=\1)
- a positive lookahead that matches a location that is immediately followed with the same value as in Group 1.
Note the second regex does not use lookarounds and the main value is captured into Group 2.
See the JavaScript demo below:
const text = "=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i=";
console.log( text.match(/(?<=([\/=]))[A-Z][a-z]{2,}(?=\1)/g) );
Regex #2 test:
const text = "=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i=";
const re = /([\/=])([A-Z][a-z]{2,})\1/g;
let matches = [], m;
while (m = re.exec(text)) {
matches.push(m[2]);
}
console.log(matches);
CodePudding user response:
either ...
/([=\/])(?<content>.*?)\1/g
... which utilizes capturing groups and a backreferenceor ...
/(?<delimiter>[=\/])(?<content>.*?)\k<delimiter>/g
... with an additional named capturing group re-used as named backreference
const sampleDate = `=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i=`;
// see ... [https://regex101.com/r/86JtMp/2]
const regXGroups = /([=\/])(?<content>.*?)\1/g;
// see ... [https://regex101.com/r/86JtMp/1]
const regXNamedGroups = /(?<delimiter>[=\/])(?<content>.*?)\k<delimiter>/g;
console.log(
Array
.from(
sampleDate.matchAll(regXGroups)
)
//.map(({ groups }) => groups?.content)
.map(([match, delimiter, content]) => content)
);
console.log(
[...sampleDate.matchAll(regXNamedGroups)]
//.map(({ groups }) => groups?.content)
.map(({ groups: { content } }) => content)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
OP ...
"I want to extract only the matches that start with a capital letter and have 3 or more letters in total between the
=
or/
"
For this, in order to not just limit the code to ASCII / Basic Latin by using character classes like [A-Z]
/ [a-zA-Z]
, one could make use of regex unicode property escapes like \p{Lu}
for any uppercase letter and \p{L}
for any letter ... /([=\/])(\p{Lu}\p{L}{2,})\1/gu
... or ... /(?<delimiter>[=\/])(?<content>\p{Lu}\p{L}{2,})\k<delimiter>/gu
const sampleDate = `=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i=`;
// see ... [https://regex101.com/r/86JtMp/4]
const regXGroups = /([=\/])(\p{Lu}\p{L}{2,})\1/gu;
// see ... [https://regex101.com/r/86JtMp/3]
const regXNamedGroups =
/(?<delimiter>[=\/])(?<content>\p{Lu}\p{L}{2,})\k<delimiter>/gu;
console.log(
Array
.from(
sampleDate.matchAll(regXGroups)
)
.map(([match, delimiter, content]) => content)
);
console.log(
[...sampleDate.matchAll(regXNamedGroups)]
.map(({ groups: { content } }) => content)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }