Home > Software design >  Match first and then all equal occurrences with regex
Match first and then all equal occurrences with regex

Time:05-20

Lets say we have the string:

one day, when Anne, Lisa and Paul went to the store, then Anne said to Paul: "I love Lisa!". Then Lisa laughed and kissed Anne.

is there a way with regex to match the first name, and then match and all other occurrences of the same name in the string?

Given the name-matching regex /[A-Z][a-z] (with /g maybe?), can the regex matcher be made to remember the first match, and then use that match EXACTLY for the rest of the string? Other subsequent matches to the name-matching regex should be ignored (except for Anne in the example).

The result would be (if matches are replaced with "Foo"):

one day, when Foo, Lisa and Paul went to the store, then Foo said to Paul: "I love Lisa!". Then Lisa laughed and kissed Foo.

Please ignore the fact that the sentence start uncapitalized, or add an example that also handles this.

Using a script to get the first match and then using that as input for a second iteration works of course, but that's outside the scope of the question (which is limited to ONE regex expression).

CodePudding user response:

The only way I could think of is with non-fixed width lookbehinds. For example through Pypi's regex module, and maybe Javascript too? Either way, assuming a name is capture through [A-Z][a-z] as per your question try:

\b([A-Z][a-z] )\b(?<=^[^A-Z]*\b\1\b.*)

See an online demo


  • \b([A-Z][a-z] )\b - A 1st capture group capturing a name between two word-boundaries;
  • (?<=^[^A-Z]*\b\1\b.*) - A non-fixed width positive lookbehind to match start of line anchor followed by 0 characters other than uppercase followed by the content of the 1st capture group and 0 characters.

Here is a PyPi's example:

import regex as re

s= 'Anne, Lisa and Paul went to the store, then Anne said to Paul: "I love Lisa!". Then Lisa laughed and kissed Anne.'
s_new = re.sub(r'\b([A-Z][a-z] )\b(?<=^[^A-Z]*\b\1\b.*)', 'Foo', s)
print(s_new)

Prints:

Foo, Lisa and Paul went to the store, then Foo said to Paul: "I love Lisa!". Then Lisa laughed and kissed Foo.
  • Related