Home > OS >  Regex for length and variety of characters
Regex for length and variety of characters

Time:05-30

I would like to know if the following constraint can be checked with regex: "Must be at least 5 characters, of which 4 should be letters"

I know how to put the Must be at least 5 characters constraint, but not sure of of which 4 should be letters if it's even possible with regex.

CodePudding user response:

Yes, it is possible. Please use the following regex.

(?=.{5,})\w*[a-z]\w*[a-z]\w*[a-z]\w*[a-z]\w*

Explanation

  • (?= Lookahead assertion - assert that the following regex matches
    • . Any character
    • {5,} Not less than 5 repetitions
  • ) Close lookahead
  • \w*[a-z]\w*[a-z]\w*[a-z]\w*[a-z]\w* Static letters from a to z in any order

NOTE: The (?=.{5,}) asserts that the string match 5 or more characters

CodePudding user response:

You can also use this pattern:

(?i)(?=.*[a-z].*[a-z].*[a-z].*[a-z].*).{5,}

Here, the positive lookahead (?=.*[a-z].*[a-z].*[a-z].*[a-z].*) asserts that there must be four letters (case does not play a role given (?i)) either directly or indirectly following each other. Once that condition is met the regex matches any string that is at least 5 characters long

CodePudding user response:

What language or tool are you using?

This sounds like one of those things that doesn't need to be a single regex.

Here's "at least four letters"

[a-z].*[a-z].*[a-z].*[a-z]

and here's "at least five characters"

.{5,}

or even, if you're in a language like PHP, avoid regexes entirely and be more explicitly clear:

length($str) >= 5

CodePudding user response:

You can even do this without lookahead! Consider the following RegEx:

(. [a-z].*[a-z].*[a-z].*[a-z].*)|(.*[a-z]. [a-z].*[a-z].*[a-z].*)|(.*[a-z].*[a-z].*[a-z]. [a-z].*)|(.*[a-z].*[a-z].*[a-z]. [a-z].*)|(.*[a-z].*[a-z].*[a-z].*[a-z]. )

Depending on your engine you may have to anchor this using ^ and $.

Generation: Simply shifted the quantifier all the way through: The four letters are a must, but the fifth letter can be at any position.

If possible, you should avoid using RegEx for this though, or combine a RegEx that checks whether four letters are present (.*[a-z].*[a-z].*[a-z].*[a-z].*) with a simple length check.

If you need exactly 5 characters to be letters, replace . with [^a-z].

If you can use regular grammars, this can be written way shorter:

S → %aA | .S'
S' → %aA' | .S'
A → %aB | .A'
A' →%aB' | .A'
B → %aC | .B'
B' → %aC | .B'
C → %aD | .C'
C' → %aD' | .C'
D → .D'
D' → ε

where S is the start symbol, . stands for any character and %a for any letter. Five states are needed to keep track of how many characters have been read; each state X also needs a state X' to keep track of whether a non-letter character has been read yet.

  • Related