I need a regex that matches (and lists) all 'modifiers' of a string. Modifiers are individual letters behind the last :
in the string. Modifiers can have variables which would be written in curly brackets, e.g. a{variable}
. Variables may contain the character :
-- which makes it a bit tricky, because we must look for the last :
that is NOT between {
and }
. This is currently my biggest problem, see Example 6 below.
(If it matters, the target language for this will be javascript.)
I got this working already for the most cases, but got a few edge cases that I can not get to work.
My regex so far is:
/(?!.*:)([a-z](\{.*?\})*)/g
Example 1: Single modifier
something:a
should match a
- working fine
Example 2: Multiple modifiers
something:abc
should match a
, b
, and c
- working fine
Example 3: Single modifier with variable
something:a{something}
should match a{something}
- working fine
Example 4: Single modifier with multiple variables
something:a{something}{something}
should match a{something}{something}
- working fine
Example 5: Multiple modifiers with variables
something:ab{something}cd{something}{something}efg
should match a
, b{something}
, c
, d{something}{something}
, e
, f
, g
- working fine
Example 6: Variable containing :
something:a{something:2}
- should match a{something:2}
- does NOT work. I probably need to modify the negative lookahead somehow to ignore colons in curly brackets, but I couldn't find out how to do that.
Example 7: String not containing a :
something
- should match nothing, but matches each letter individually. This may or may not be easy to fix, but my brain currently can't work this out.
Here is a link to test / play around with this regex and the examples: https://regexr.com/6h4h0
If anyone can help me to figure out how to make the regex work for example 6 and 7, I'd be very grateful!
CodePudding user response:
You can use
const regex = /.*:((?:[a-zA-Z](?:{[^{}]*})*) )$/;
const extract_rx = /[a-zA-Z](?:{[^{}]*})*/g;
const texts = ['something:a','something:abc','something:a{something}','something:a{something}{something}','something:ab{something}cd{something}{something}efg','something:a{something:2}','something:a{something:2}b{something:3}','something'];
for (const text of texts) {
const m = text.match(regex);
if (m) {
const matches = m[1].match(extract_rx);
console.log(text, '=>', matches);
} else {
console.log(text, '=> NO MATCH');
}
}
See the main regex demo. Details:
.*:
- matches any zero or more chars other than line break chars as many as possible and then a:
followed with...((?:[a-zA-Z](?:{[^{}]*})*) )
- Group 1: one or more sequences of[a-zA-Z]
- an ASCII letter(?:{[^{}]*})*
- zero or more sequences of a{
, zero or more chars other than{
and}
and then a}
char
$
- end of string.
Once there is a match, Group 1 is parsed again to extract all sequences of a letter and then any zero or more {...}
substrings right after from it.
CodePudding user response:
What you could do instead is make sure there is a colon somewhere before the matched string with a positive lookbehind.
Essentially switching (?!.*:)
for (?<=:.*)
.
const regex = /(?<=:.*)([a-z](\{.*?\})*)/g;
const strings = [
"something:a",
"something:abc",
"something:a{something}",
"something:a{something}{something}",
"something:ab{something}cd{something}{something}efg",
"something:a{something:2}",
"something",
];
for (const string of strings) {
console.log(string.match(regex));
}
CodePudding user response:
Not sure if this is what you want:
:([a-z\{.*?\}0-9])*
I would try longer, but have to go catch a flight.