Home > front end >  Regex to match characters after the last colon that is not within curly brackets
Regex to match characters after the last colon that is not within curly brackets

Time:03-11

I need a regex that matches (and lists) all 'modifiers' of a string. Modifiers are individual letters behind the last : in the string. Modifiers can have variables which would be written in curly brackets, e.g. a{variable}. Variables may contain the character : -- which makes it a bit tricky, because we must look for the last : that is NOT between { and }. This is currently my biggest problem, see Example 6 below.

(If it matters, the target language for this will be javascript.)

I got this working already for the most cases, but got a few edge cases that I can not get to work.

My regex so far is:

/(?!.*:)([a-z](\{.*?\})*)/g

Example 1: Single modifier

something:a should match a - working fine

Example 2: Multiple modifiers

something:abc should match a, b, and c - working fine

Example 3: Single modifier with variable

something:a{something} should match a{something} - working fine

Example 4: Single modifier with multiple variables

something:a{something}{something} should match a{something}{something} - working fine

Example 5: Multiple modifiers with variables

something:ab{something}cd{something}{something}efg should match a, b{something}, c, d{something}{something}, e, f, g - working fine

Example 6: Variable containing :

something:a{something:2} - should match a{something:2} - does NOT work. I probably need to modify the negative lookahead somehow to ignore colons in curly brackets, but I couldn't find out how to do that.

Example 7: String not containing a :

something - should match nothing, but matches each letter individually. This may or may not be easy to fix, but my brain currently can't work this out.

Here is a link to test / play around with this regex and the examples: https://regexr.com/6h4h0

If anyone can help me to figure out how to make the regex work for example 6 and 7, I'd be very grateful!

CodePudding user response:

You can use

const regex = /.*:((?:[a-zA-Z](?:{[^{}]*})*) )$/;
const extract_rx = /[a-zA-Z](?:{[^{}]*})*/g;
const texts = ['something:a','something:abc','something:a{something}','something:a{something}{something}','something:ab{something}cd{something}{something}efg','something:a{something:2}','something:a{something:2}b{something:3}','something'];
for (const text of texts) {
  const m = text.match(regex);
  if (m) {
    const matches = m[1].match(extract_rx);
    console.log(text, '=>', matches);
  } else {
    console.log(text, '=> NO MATCH'); 
  }
}

See the main regex demo. Details:

  • .*: - matches any zero or more chars other than line break chars as many as possible and then a : followed with...
  • ((?:[a-zA-Z](?:{[^{}]*})*) ) - Group 1: one or more sequences of
    • [a-zA-Z] - an ASCII letter
    • (?:{[^{}]*})* - zero or more sequences of a {, zero or more chars other than { and } and then a } char
  • $ - end of string.

Once there is a match, Group 1 is parsed again to extract all sequences of a letter and then any zero or more {...} substrings right after from it.

CodePudding user response:

What you could do instead is make sure there is a colon somewhere before the matched string with a positive lookbehind.
Essentially switching (?!.*:) for (?<=:.*).

Playground

const regex = /(?<=:.*)([a-z](\{.*?\})*)/g;

const strings = [
  "something:a",
  "something:abc",
  "something:a{something}",
  "something:a{something}{something}",
  "something:ab{something}cd{something}{something}efg",
  "something:a{something:2}",
  "something",
];

for (const string of strings) {
  console.log(string.match(regex));
}

CodePudding user response:

Not sure if this is what you want:

:([a-z\{.*?\}0-9])*

I would try longer, but have to go catch a flight.

  • Related