I am writing a regex to match only strings not containing any capital letters, any number or some special characters. So far, I managed to match strings without capital letters but the accented capital letters are still matched.
Word.where.not("content ~* ?", "([0-9])|([.,?!])|^[A-Z][a-zA-Z]*")
The content column is a string of a word. Eg. "car", "Anatolia", "Érevan".
I want to match: "Érevan", "aNaTOLa", "J-core" but not "car" or "city-council".
Any idea which regex is appropriate? I tried to use :upper:
but I guess I am doing something wrong as it's not working. Thanks.
CodePudding user response:
You can use
Word.where.not("content ~ ?", "([0-9])|([.,?!])|^[[:upper:]][[:alpha:]-]*$")
Here, ^[[:upper:]][[:alpha:]-]*$
matches
^
- start of string[[:upper:]]
- any uppercase letter[[:alpha:]-]*
- zero or more letters or hyphens$
- end of string.
If there can be any char but whitespace in the "word" replace [[:alpha:]-]
with \S
or [^[:space:]]
.
If you do not care what kind of chars there are after the uppercase letter, use
Word.where.not("content ~ ?", "([0-9])|([.,?!])|^[[:upper:]]")