Home > database >  How to make a regex capturing group optional but in pairs?
How to make a regex capturing group optional but in pairs?

Time:12-22

Im trying to capture the following situations

John;123 = John and 123
`John;123` = John and 123
"John;123" = John and 123
John;John;123 = John;John and 123
`John;123 = `John and 123
John;123' = No capture

SO I have the following regex pattern: (?:'|")(.*);([0-9]. )(?:'|") which does well with the quotation marks and the semicolon capturing groups.

But I'm having trouble with making the quotation marks optional as a pair. Meaning, either you count both the outer quotations or you don't.

If you don't count them, then they are actually part of the name and shouldn't be included as the outside capturing group.

I tried to make them optional as follows: (?:'|")?(.*);([0-9]. )(?:'|")? but then the pattern is falsely captured as:

`John;123 = John and 123

when it should be

`John;123 = `John and 123

Since it was not a paired quotation then the quotation is part of the name

Any ideas on this one?

CodePudding user response:

This is as close as I could get it. Not sure if there is a nicer version, but this seems to work.

I essentially join two different regex patterns with an OR statement |, because it seems there is more than one pattern at play, thus making this particular requirement very challenging.

The first pattern looks for patterns within quotation marks, and the second pattern looks for those outside of quotation marks.

The first pattern looks like this:

^(?>`|")([A-Za-z] )(?>;)([0-9] )(?>`|")$

The second pattern looks like this:

^(`?[A-Za-z] )(?>;)([A-Za-z] )?;?([0-9] )$


And when you combine both of them together with an OR statement, you get the following:

Final Regex:

^(?>`|")([A-Za-z] )(?>;)([0-9] )(?>`|")$|^(`?[A-Za-z] )(?>;)([A-Za-z] )?;?([0-9] )$

See Demo.

CodePudding user response:

You might use a capture group with a backreference to match up any of the quotes, if you don't allow any of the quotes in the capture groups that you want.

The capture group values of interest are in group 2 and in group 3.

(?<!\S)([`"']?)([^\s;] (?:;[^\s;`"'] )*);([^\s;`"'] )\1(?!\S)

Regex demo

  • Related