Ok, I'm trying to catch text using a regex with the following rules:
- Each new line starts with the word
type
ortag
, and:
comes after that. |type
ortag
should be the capture group 1 - A varchar might come after
:
| That varchar should be the capture group 2 \\
comes after that- A number comes after
\\
| That number should be the capture group 3 ?
might come after the number- If we have
?
, a varchar might come after?
| That varchar should be the capture group 4 - If we have
?
a varchar, then:
might come after that - If we have
?
a varchar:
, then a varchar might come after that | That varchar should be the capture group 5
Examples:
type:test\\1?value12:value9 // Should get: Group 1 = type, Group 2 = test, Group 3 = 1, Group 4 = value12, Group 5 = value9
type:\\22?value62:value3 // Should get: Group 1 = type, Group 2 = NULL, Group 3 = 22, Group 4 = value62, Group 5 = value3
My regex is:
/(type|tag):([^\\] )?\\\\([0-9]{1,3})?\??([^\:] ):([^\:] )?/i
I believe that it's not accurate, for example:
type:\\1p?hello:iii
The current regex matches 1
as Group 3 and p?hello
as Group 4, however, it should not match this at all. Group 3 must be number and ?
might come after it, type:\\1p?hello:iii
doesn't follow the format that we want.
Anyone can help please? Thanks!
CodePudding user response:
Try this
/(type|tag):(\w )?\\\\([0-9]{1,3})?\??(\w )?:?(\w )?/gi
I think it's better to match word \w
instead of just avoiding matching others characters
CodePudding user response:
You can use
/^(type|tag):([a-zA-Z0-9]*)\\\\([0-9]{1,3})(?:\?([a-zA-Z0-9] )(?::([a-zA-Z0-9] ))?)?$/i
See the regex demo. Details:
^
- start of string(type|tag)
-type
ortag
:
- a colon([a-zA-Z0-9]*)
- Group 2: zero or more alphanumeric chars\\\\
- two backslashes([0-9]{1,3})
- Group 3: one, two or three digits(?:\?([a-zA-Z0-9] )(?::([a-zA-Z0-9] ))?)?
- an optional sequence of\?
- a?
char([a-zA-Z0-9] )
- Group 4: one or more alphanumeric chars(?::([a-zA-Z0-9] ))?
- an optional sequence of:
- a colon([a-zA-Z0-9] )
- Group 5: one or more alphanumeric chars
$
- end of string.