Home > database >  Catch text using a regex with specific rules
Catch text using a regex with specific rules

Time:09-27

Ok, I'm trying to catch text using a regex with the following rules:

  • Each new line starts with the word type or tag, and : comes after that. | type or tag should be the capture group 1
  • A varchar might come after : | That varchar should be the capture group 2
  • \\ comes after that
  • A number comes after \\ | That number should be the capture group 3
  • ? might come after the number
  • If we have ?, a varchar might come after ? | That varchar should be the capture group 4
  • If we have ? a varchar, then : might come after that
  • If we have ? a varchar :, then a varchar might come after that | That varchar should be the capture group 5

Examples:

type:test\\1?value12:value9        // Should get: Group 1 = type, Group 2 = test, Group 3 = 1, Group 4 = value12, Group 5 = value9

type:\\22?value62:value3        // Should get: Group 1 = type, Group 2 = NULL, Group 3 = 22, Group 4 = value62, Group 5 = value3

My regex is:

/(type|tag):([^\\] )?\\\\([0-9]{1,3})?\??([^\:] ):([^\:] )?/i

I believe that it's not accurate, for example:

type:\\1p?hello:iii

The current regex matches 1 as Group 3 and p?hello as Group 4, however, it should not match this at all. Group 3 must be number and ? might come after it, type:\\1p?hello:iii doesn't follow the format that we want.

Anyone can help please? Thanks!

CodePudding user response:

Try this

/(type|tag):(\w )?\\\\([0-9]{1,3})?\??(\w )?:?(\w )?/gi

Tested here and php here

I think it's better to match word \w instead of just avoiding matching others characters

CodePudding user response:

You can use

/^(type|tag):([a-zA-Z0-9]*)\\\\([0-9]{1,3})(?:\?([a-zA-Z0-9] )(?::([a-zA-Z0-9] ))?)?$/i

See the regex demo. Details:

  • ^ - start of string
  • (type|tag) - type or tag
  • : - a colon
  • ([a-zA-Z0-9]*) - Group 2: zero or more alphanumeric chars
  • \\\\ - two backslashes
  • ([0-9]{1,3}) - Group 3: one, two or three digits
  • (?:\?([a-zA-Z0-9] )(?::([a-zA-Z0-9] ))?)? - an optional sequence of
    • \? - a ? char
    • ([a-zA-Z0-9] ) - Group 4: one or more alphanumeric chars
    • (?::([a-zA-Z0-9] ))? - an optional sequence of
      • : - a colon
      • ([a-zA-Z0-9] ) - Group 5: one or more alphanumeric chars
  • $ - end of string.
  • Related