I want to be able to match with a certain condition, and keep certain parts of it. For example:
June 2021 9 Feature Article Three-Suiters Via Puppets Kai-Ching Lin
should turn into:
Jun 2021 Three-Suiters Via Puppets Kai-Ching Lin
So, everything until the end of the word Article
should be matched; then, only the first three characters of the months is kept, as well as the year, and this part is going to replace the matched characters.
My strong regex knowledge got me as far as:
. Article(?)
CodePudding user response:
You could use 2 capture groups and use those in a replacement:
\b([A-Z][a-z] )[a-z](\s \d{4})\b.*?\bArticle\b
\b
A word boundary to prevent a partial word match([A-Z][a-z] )
Capture group 1, match a single uppercase char and 1 lowercase chars[a-z]
Match a single char a-z(\s \d{4})\b
Capture group 2, match 1 whitspace chars and 4 digits followed by a word boundary.*?\bArticle\b
Match as least as possible chars until Article
The replaced value will be
Jun 2021 Three-Suiters Via Puppets Kai-Ching Lin
CodePudding user response:
You could use positive lookbehinds:
(?<=^[A-Z][a-z]{2})[a-z]*|(?<=\d{4}).*Article
(?<=^[A-Z][a-z]{2})
- behind me is the start of a line and 3 chars; presumably the first three chars of the month[a-z]*
- optionally, capture the rest of the month|
- or(?<=\d{4})
- behind me is 4 digits; presumably a year.*Article
- capture everything leading up to and including "Article"