Home > Software design >  Match words that start with a sequence of 3 letters that also appears in the same order later in the
Match words that start with a sequence of 3 letters that also appears in the same order later in the

Time:08-24

I am trying to match these words with regex:

match don't match
allochirally anticker
anticovenanting corundum
barbary crabcatcher
calelectrical damnably
entablement foxtailed
ethanethiol galvanotactic
froufrou gummage
furfuryl gurniad
galagala hypergoddess
heavyheaded kashga
linguatuline nonimitative
mathematic parsonage
monoammonium pouchlike
perpera presumptuously
photophonic pylar
purpuraceous rachioparalysis
salpingonasal scherzando
testes swayed
trisectrix unbridledness
undergrounder unupbraidingly
untaunted wellside

As you can tell there is a pattern in the match column, such that every word has their first three letters appear again in the same order in the word.

CodePudding user response:

"Finding" a certain string, as you say, can be operationalized as "extracting". If your goal, then, is to extract the three letters that get repeated within the words, you can use this pattern:

(\w{3})(?=.*\1)

which can be simplified if you have only alphabetic characters to:

(.{3})(?=.*\1)

The syntax here relies on two elements:

  • (.{3}): a capture group matching exactly three characters
  • (?=.*\1): a look ahead asserting that the same sequence of three characters must re-occur after zero or more intervening characters

CodePudding user response:

Following regex is can be used to match the words

([a-zA-Z]{3})[a-zA-Z]*\1[a-zA-Z]*

([a-zA-Z]{3}) - Groups first 3 letters
[a-zA-Z]* - Allows any letters to be matched 
\1 - Matches the grouped first first group (3 letters)
[a-zA-Z]* - Allows any letters to be matched 
  • Related