Home > Enterprise >  What regular expression would match words for which the first and last letter are different?
What regular expression would match words for which the first and last letter are different?

Time:10-09

I know how to match a string having the same first and last character using:

/^(.).*\1$/

I want the opposite requirement for words: a regex to match words where the first and last letter are different. For example, abc should match since 'a' and 'c' are different, and bgb should fail since it begins and ends with 'b'.

I tried with /^(.).*(?!\1)$/, but it had both false positives and negatives (it matched when it shouldn't, and didn't match when it should).

What regex would match words where the first and last letters are different?

CodePudding user response:

A negative lookahead as you were attempting can be made to work:

^(.).*(?!\1).$

This pattern says to match:

  • ^ from the start of the string
  • (.) match any first character and capture it in \1
  • .* match zero or more additional characters
  • (?!\1) assert that the last character is NOT \1
  • . match any last character (which cannot be \1)
  • $ end of the string

Here is a running demo.

CodePudding user response:

Firstly, we need a regex to match words. The simplest is to use the word metacharacter class, \w. Note that this will match not only letters but digits and the underscore; if it's not the appropriate character class, you'll need to substitute something else (such as the Unicode letter category \P{L}).

Next, a regex that will match a full word, which is fairly straight forward. Simply match a sequence of word-characters, anchored by word boundaries:

\b\w \b

Next, capture the first letter:

\b(\w)\w*\b

Finally, use a negative lookahead to negate the backreference. With some engines, you can do this with a lookbehind (demo):

\b(\w)\w*(?<!\1)\b

When working with multiple-characters patterns that break down parts into sub-patterns, it's important to consider the qualifiers. Note that \w\w* is equivalent to \w , and so will match words of 1 or more letters. The (?<!\1) will apply to the end of the word (here, the last letter); for 1 letter words, the first and last letter are the same, so the pattern will always fail, which is desirable (so 'a' will never be matched). For words of 2 or more letters, it will compare letters in different positions, which is desirable. Thus \w\w* works as a base pattern. Note that \w\w would also work.

Some engines place restrictions on lookbehinds, such as not allowing backreferences in them. In this case, the pattern could be re-written to use a lookahead, placed before the last letter, which thus must be separated out (as the first letter was):

\b(\w)\w*(?!\1)\w\b

(demo)

Again, this should be examined in terms of length (left as an exercise).

Finally, make sure to set any relevant flags, such as case-insensitivity.

  • Related