Home > Enterprise >  Regex: Replace certain part of the matched characters
Regex: Replace certain part of the matched characters

Time:03-05

I want to be able to match with a certain condition, and keep certain parts of it. For example:

June 2021 9 Feature Article Three-Suiters Via Puppets Kai-Ching Lin

should turn into:

Jun 2021 Three-Suiters Via Puppets Kai-Ching Lin

So, everything until the end of the word Article should be matched; then, only the first three characters of the months is kept, as well as the year, and this part is going to replace the matched characters.

My strong regex knowledge got me as far as:

. Article(?)

CodePudding user response:

You could use 2 capture groups and use those in a replacement:

\b([A-Z][a-z] )[a-z](\s \d{4})\b.*?\bArticle\b
  • \b A word boundary to prevent a partial word match
  • ([A-Z][a-z] ) Capture group 1, match a single uppercase char and 1 lowercase chars
  • [a-z] Match a single char a-z
  • (\s \d{4})\b Capture group 2, match 1 whitspace chars and 4 digits followed by a word boundary
  • .*?\bArticle\b Match as least as possible chars until Article

Regex demo

The replaced value will be

Jun 2021 Three-Suiters Via Puppets Kai-Ching Lin

CodePudding user response:

You could use positive lookbehinds:

(?<=^[A-Z][a-z]{2})[a-z]*|(?<=\d{4}).*Article
  • (?<=^[A-Z][a-z]{2}) - behind me is the start of a line and 3 chars; presumably the first three chars of the month
  • [a-z]* - optionally, capture the rest of the month
  • | - or
  • (?<=\d{4}) - behind me is 4 digits; presumably a year
  • .*Article - capture everything leading up to and including "Article"

https://regex101.com/r/fbYdpH/1

  • Related