Home > Blockchain >  RegEx: Any string must contain at least N chars from a specific list of chars
RegEx: Any string must contain at least N chars from a specific list of chars

Time:09-11

I'm very new to learning RegEx, and need a little help updating what I have. I will use it to evaluate student spreadsheet functions. I know it isn't perfect, but I'm trying to use this as a stepping stone to a better understanding of RegEx. I currently have [DE45\\ \\s]*$ but it does not validate for criteria #4 below. Any help is greatly appreciated.

I need to validate an input so that it matches these four criteria:

  1. Letters D and E: (UPPERCASE, in any order, in any length string)
  2. Numbers 4 and 5: (in any order, in any length string) Special
  3. Characters: comma (,) and plus ( ) (in any order, in any length string)
  4. All six characters DE45 , must be present in the string at least once.

Results

  • pass: =if(D5>0,E4 D5,0)
  • pass: =if(D5>0,D5 E4,0)
  • fail: Dad Eats @ 05:40
  • pass: Dad, Eats Drinks @ 05:40
  • fail: =if(E4 D5)
  • pass: DE45 ,

CodePudding user response:

The attempt you made -- with a character class -- will not work since [DE45] matches any single character in the class -- not all of them.

This type of problem is solved with a series of anchored lookaheads where all of these need to be true for a match at the anchor:

^(?=.*D)(?=.*E)(?=.*\ )(?=.*4)(?=.*5)(?=.*,)

Demo

Lookaround tutorial

Also, depending on the language, you can chain logic with regex matches. In Perl for example you would do:

/D/ && /E/ && /\ / && /4/ && /5/ && /,/

In Python:

all(re.search(a_str, e) for p in [re.escape(c) for c in 'DE45 ,'])

Of course easier still is to use a language's set functions to test that all required characters are present.

Here in Python:

set(a_str) >= set('DE45 ,')

This returns True only if all the characters in 'DE45 ,' are in a_str.

CodePudding user response:

A Regular Expression character class (in the square brackets) is an OR search. It will match if any one of the characters in it is present, which does not allow you to verify #4.

For that you could build on top of a regex, as follows:

  1. Find all instances of any of the characters you're looking for individually with a simple character class search. (findall using [DE45 ,] )
  2. Merge all the found characters into one string (join)
  3. Do a set comparison with {DE45 ,}. This will only be True if all the characters are present, in any amount and in any order (set)
set(''.join(re.findall(r'[DE45 ,] ','if(D5>0,4 D5,0)E'))) == set('DE45 ,')

You can generalize this for any set of characters:

import re

lookfor = 'DE45 ,'
lookfor_re = re.compile(f'[{re.escape(lookfor)}] ')
strings = ['=if(D5>0,E4 D5,0)', '=if(D5>0,D5 E4,0)', 'Dad Eats @ 05:40', 'Dad, Eats Drinks @ 05:40', '=if(E4 D5)', 'DE45 ,']
for s in strings:
    found = set(''.join(lookfor_re.findall(s))) == set(lookfor)
    print(f'{s} : {found}')

Just set lookfor as a string containing each of the characters you're looking for and strings as a list of the strings to search for. You don't need to worry about escaping any special characters with \. re.escape does this for you here.

=if(D5>0,E4 D5,0) : True
=if(D5>0,D5 E4,0) : True
Dad Eats @ 05:40 : False
Dad, Eats Drinks @ 05:40 : True
=if(E4 D5) : False
DE45 , : True
  • Related