Home > database >  how to manage duplicated characters in regex
how to manage duplicated characters in regex

Time:03-29

I'm using this regex to find ALL of the following occurrences in an array:

/^.*(?=.*T)(?=.*O)(?=.*T)(?=.*A).*$/

it matches

pOTATO
mATTO
cATeTO

but also

lATO
minAreTO
AnTicO

although this last three words have just one T

how can I extract only words containing at least two Ts, one A and one O, in any order?

CodePudding user response:

Since lookarounds stand their ground, once the first lookaround is tried, the next, and all subsequent ones after the first lookaround are checked from exactly the same position.

You need to use

/^(?=.*T.*T)(?=.*O)(?=.*A).*/
/^(?=.*T[^T]*T)(?=.*O)(?=.*A).*/

Note the missing .* after ^, it is not necessary as it is enough to only fire the lookaheads once at the string start position. Now, (?=.*T.*T) makes sure there are two repetitions of zero or more chars other than line break chars as many as possible followed with a T char. (?=.*T[^T]*T) makes sure there are zero or more chars other than line break chars as many as possible and then T, zero or more chars other than T and then another T.

See regex demo #1 and regex demo #2. Note that (?=.*T[^T]*T) can match more than (?=.*T.*T) since [^T] can match line break chars. To avoid that in the demo, I added \n into the negated character class.

  • Related