Home > Software design >  How to correctly specify a quantifier for a group of words?
How to correctly specify a quantifier for a group of words?

Time:01-25

Table has field containing the list of IDs separated by "-".

Example: 559-3319-3537-4345-29923

I need to check rows that use at least 4 of the specified identifiers using regex

Example: before inserting to the db, I need to check the value 559-3319-3537-29923-30762 for this condition.

I've build a pattern that only works in the specified order, but if the IDs are swapped, it doesn't work.

Template: ^.*\b(-*(559|3319|3537|29923|30762)-*){4,}\b.*$

Initially, I thought that a simple (559|3319|3537|29923|30762){4,} should be enough, but in this case it also doesn't work, although it sees all 4 values without a quantifier.

Please tell me how to write such an expression correctly.

CodePudding user response:

Edit:

^.*\b((559|3319|3537|29923|30762)-?([0-9]*)?-?){4,}\b.*$

The reasoning behind this is that each group is not just one of the 5 numbers, but it can include some extra characters. So the matched groups in your example are:

  • (559-)
  • (3319-)
  • (3537-4345-)
  • (29923)

Original answer:

This would be one way to do it (not sure if there are other ways to do it):

^.*\b(559|3319|3537|29923|30762)[0-9-]*(559|3319|3537|29923|30762)[0-9-]*(559|3319|3537|29923|30762)[0-9-]*(559|3319|3537|29923|30762)\b.*$

CodePudding user response:

For ease of reading/testing, I've simplified the Ids being searched for to single digit integers 1-5. The following pattern will match strings with at least 4 out of the 5 ids:

(\b(1|2|3|4|5)\b.*){4,}

(Play with this here)

OR MySQL's regex dialect:

([[:<:]](1|2|3|4|5)[[:>:]].*){4,}

(Play with MySQL version here)

Here are some examples:

# Example Is Match? Description
1 1-2-3-4-5 YES All the Ids
2 1-2-3-9-5 YES Enough Ids
3 1-1-9-1-1 YES Enough Ids, but there are repeats
4 9-8-7-6-0 NO None of the Ids
5 1-2-3-9-9 NO Some, but not enough of the Ids

If the repeated Ids as shown in example 3 are an issue, then regex is probably not a good fit for this problem.

  • Related